Key Points
- Set Risk-Based Patch SLAs: Define patch timelines and exception policies based on software tiers, severity, and business impact.
- Use Deployment Rings for Safety: Deploy patches in phased rings to validate stability and reduce disruption.
- Automate Patch Workflows: Automate approvals, scheduling, and sandbox testing to streamline patching and limit human delays.
- Track Compliance Metrics: Monitor install times, SLA performance, and failure rates to maintain continuous visibility and governance.
- Control and Retire Exceptions: Document exceptions, assign owners, enforce expiry dates, and apply compensating controls to manage risk.
Prioritizing software patching, including implementing an appropriate SLA for vulnerability management, while balancing cybersecurity best practices and end-user needs, is a typical IT governance challenge for IT teams and managed service providers (MSPs). Even a diligently planned patch management policy process will lose effectiveness and stall during implementation when priorities are vaguely defined, manual approval is required, or exceptions are not tracked and retired.
This practical guide provides a framework for effective patch management that leverages automation, vendor-defined best practices, and continuous oversight, resulting in ongoing vulnerability remediation coverage that accounts for edge cases and exceptions.
How are service level agreements (SLAs) and patch management related?
A service level agreement (SLA) formalizes the relationship between an MSP and its clients. It establishes the role you play, including the specific services you agree to provide, and metrics about their reliability and availability. SLAs establish trust, and ensure that the scope of work provided is clear and pre-defined.
Patch management plays an important role in the services MSPs provide to their clients. Keeping software up-to-date is a key method for protecting systems from cybersecurity threats, and a compliance requirement for many data protection and privacy frameworks. Organizations expect cybersecurity protections, mitigations, and resolution paths to be documented in their SLAs.
What you need to set realistic patch management goals
For effective patch compliance, and setting realistic SLAs based around them, you’ll need:
- Documented software asset tiers and maintenance windows
- Patch notes including vulnerability severity and vendor trust lists
- Patch deployment and automation tools that support staged rings and automatic rollback
- A centralized IT documentation platform to store monthly metrics and an exception register
Note that the framework provided in this guide is not universal: you will need to tweak them based on your clients unique operating environment, factoring in internal policy as well as any relevant data protection or privacy laws that govern the business or its customers.
Step 1: Define risk-based SLAs and targets for patch management
Define and tabulate software asset tiers, including how critical they are to core IT availability as well as business operations. For each tier, decide on how fast patches must be rolled out, how emergency approvals will be handled, and which team members can authorize exceptions.
Base these targets on the specific software assigned to each tier, including vendor guidance on patching, and predicted maintenance windows based on usage.
For example, you may implement a high-priority tier in which critical, external-facing software is automatically patched and tested overnight, while productivity software that is needed by employees and does not pose an active threat waits for weekend patch windows after testing compatibility with other tools. Emergency approvals that fall outside tiered policies (like patching software under an active exploit, like an email server) should also be carefully planned.
Define SLAs based on these, as well as a policy for exceptions, including emergency response, ensuring that an authorized person will be available to approve in a timely manner. Highlight exceptions that may lead to outdated software being used in production (for example, a user who requires an older version of a productivity package for compatibility reasons).
Step 2: Leverage ringed deployment to limit damage
Phased rollouts using deployment rings let you roll back to known-good configurations if a software patch introduces more problems than it solves. Based on the tiers decided above, designate a small pilot group for initial testing, then progress to rings that cover a broader range of devices and configurations until there is a reliable success rate, and any known issues are shown to have been avoided.
Pay careful attention to patch notes and vendor best practices for patching and ensure automated patching procedures recognize and validate them.
Step 3: Automate approvals and scheduling by class
Prioritize and automate security updates where the risk of unpatched systems outweighs that of a potential patching issue. High-risk updates like drivers, firmware, and major OS/app feature updates should be held for review to prevent service interruption. Create patching and maintenance windows based on business hours and the impact on end-users, and provide users with the opportunity to complete their work before systems reboot using prompts or grace periods.
AI-enhanced patch management can assist with this, automatically flagging potentially risky patches, performing sandboxed tests, and providing summaries to your IT team for informed decision-making.
Step 4: Monitor coverage and reliability
Use your remote management and monitoring (RMM) platform to track three core metrics:
- Median time to install by tier/severity
- SLA compliance
- Failure or rollback rate by deployment ring
This information can be collated into monthly reports that can help you identify assets and tiers that consistently miss patch windows and SLA targets. You can also identify vendors who are overrepresented in these (which may mean adjustments to tiers or placement in them), and software versions that may have their own specific issues that require workarounds or mitigations.
Review your reports regularly to ensure that you have full coverage, exceptions, and emergency responses are justified, and that reliability is maintained.
Step 5: Govern exceptions with expiry and compensating controls
Document all emergency measures and exceptions, what steps were taken to enforce them, and what needs to be done to retire each of them. Assign each exception to a member of your IT team, and set an expiry date for it. Enforce this with either an automation to roll back the exception, or the automated creation or escalation of a ticket if it must be done manually.
Each exception should be assessed for its security impact, and measures put in place to mitigate these (for example, additional firewall rules) until the software can be brought up to date. These should also be documented so that they can be retired, to prevent a buildup of exceptions and no-longer-needed controls for them.
Step 6: Create a monthly evidence packet
Use your IT automation and documentation tools to automatically generate and publish monthly reports that demonstrate your SLA compliance by tier and patch success rate. With this, list any as-yet unpatched software, and the reason for it (i.e., due to exceptions, pending testing, or awaiting a suitable maintenance window).
These reports can improve client relationships by showing how SLA targets are met, and how further improvements are being made to get software patched as quickly as possible without impacting productivity.
NinjaOne brings automation and oversight to patch compliance
Patch management is a continuous balancing act: maintaining control and deploying critical patches while avoiding disrupting business workflows requires a deft touch by IT teams.
NinjaOne automates patch management as part of its IT platform that combines monitoring, management, security, backup, and support automation, as well as customer success tools. You can set rules that auto-approve critical security patches from reliable vendors, track open tasks and create tickets for SLA breaches, and automate the expiry of exceptions and mitigation methods. Monthly reports can be automatically generated from data gathered from endpoints, ensuring that automation is backed by full oversight, and that data is understandable by business stakeholders, helping them understand the benefits of being up-to-date.
Quick-Start Guide
Patch SLA and Risk Management Features
Patch Approval Options
NinjaOne offers multiple levels of patch management and risk governance:
1. Approval Levels:
– Global preemptive approvals/rejections
– Policy-level approvals/rejections
– Device-level overrides
2. Risk-Based Approval Methods:
– Approve patches after a designated number of days (up to 30 days for standard updates, 365 days for feature updates)
– Manually approve or reject patches
– AI-powered patch intelligence for risk detection
Patch Intelligence AI
– Evaluates Windows patches every six hours
– Can automatically move patches between states:
– Approved
– Manual approval
– Rejected
– Considers known issues and community feedback
CVE and Vulnerability Scoring
– Integrated Common Vulnerability Scoring System (CVSS)
– Categorizes vulnerabilities:
– Critical: CVSS ≥ 9
– High: CVSS 7-9
– Medium: CVSS 4-7
– Low: CVSS 0-4
Exception Governance
– Create device-level or policy-level overrides
– Manually approve/reject patches by:
– KB number
– Patch ID
– Uninstall patches with rollback support
– Set specific patch approval rules
Best Practices
– Test critical updates on a small subset of devices first
– Use ring deployments to stagger patch rollouts
– Monitor patch deployment results
– Adjust deployment based on initial results
NinjaOne provides a comprehensive approach to patch management that allows granular control over patch SLAs, risk assessment, and exception handling.
