Key Points
- Copy only changed data blocks to reduce bandwidth and speed up backups. Tune deduplication and exclusions to improve performance and reliability across hybrid environments.
- Use incremental or synthetic chains for large, frequently changing data, and block-level methods for VMs or archives to maximize efficiency.
- Regularly test restores to ensure integrity and SLA compliance. Measure RTOs, chain depth, and data transfer metrics to quantify improvements.
- After the initial full backup, only changed blocks are copied. This improves speed and efficiency but requires careful chain management to ensure successful restores.
Experts recommend planning how to manage these backup chains to ensure they work. This article will walk you through tuning block-level backups for Managed Service Provider (MSP) scale.
Tuning block-level backups for MSP scale
Tuning block-level backups for MSP scale involves a handful of steps: Understanding what you are optimizing, choosing a chain strategy, tuning dedup, applying the proper methods, proving restore speed, and keeping performance steady over time.
📌 Prerequisites:
- Defined RPO, RTO, and retention by workload tier
- Baseline measurements for daily change rate, job duration, and WAN ceilings
- Backup targets sized for synthetic or periodic fulls
- Secure key management for encryption and access controls
- A shared evidence repository for logs, checksums, screenshots, and KPIs
Step 1: Baseline change rate and windows
This step measures how data changes over time to provide the foundation for performance improvements.
📌 Use Case: A systems administrator is preparing to improve backup performance across a mixed workload environment. To understand where to focus, they first need to see which backup jobs take the longest to run. By collecting a two-week performance baseline, they can spot trends in data growth and job duration, then prioritize tuning efforts on the workloads that will deliver the biggest impact.
Measure metrics for two weeks
Take note of the following metrics for at least two weeks:
- Daily changed data volume (GB)
- Median and 95th percentile job durations
- Transferred GB per job
Identify specific workloads
Detect workloads with large files that change in place, including VHDX, PST, or VM image files. Note their frequency, size, and backup behavior.
Document observations
Document trends in data churn and the consistency of job durations. Highlight workloads where transfer volumes are disproportionately high compared to their actual change rates. That will enable you to quickly see which jobs may be over-transferring data and need closer tuning.
💡 Note: This step should establish a performance baseline that reflects efficiency.
Step 2: Choose a chain strategy and guardrails
This step ensures restores are predictable, fast, and not prone to corruption.
📌 Use Case: An IT team managing multiple production servers wants to balance backup speed and recovery reliability. By establishing consistent chain policies and automated safeguards, they can ensure that backups remain healthy over time and recovery operations meet business Recovery Time Objective (RTO) expectations.
Select an optimal chain strategy
Most servers are backed up using incremental backups with scheduled synthetic fulls. This minimizes daily transfer load while maintaining quick restore readiness.
Define reset intervals
Force active full backup jobs periodically when retention policies need resetting, when there is an elevated risk of corruption, or when workloads show heavy churn or block-level instability.
Establish guardrails
Establish guardrails by:
- Setting a maximum chain depth to prevent long chains.
- Configuring alerts when the threshold is exceeded.
- Regularly rehearsing restores from the latest and a mid-chain point to verify integrity and performance.
Documented chain policies tailored to each workload ensure backup chains are healthy and easily recoverable under any condition.
Step 3: Tune deduplication, compression, and exclusions
This step ensures organizations can minimize unnecessary data movement while meeting service level agreements (SLAs).
📌 Use Case: A managed service provider notices that backups exceed their window due to redundant data. After adjusting deduplication settings, enabling compression where appropriate, and excluding non-essential paths, the provider reduces both bandwidth use and storage costs.
Efficient data protection relies on a proper balance between performance and resource utilization.
- Deduplication: Choose chunk sizes that align with file types and your platform’s capabilities. Small chunks improve deduplication ratios but may increase processing overhead.
- Exclusions: Exclude transient directories, such as cache locations and databases’ temporary stores. These inflate change sets unnecessarily, increasing job durations and consuming bandwidth.
- Compression: Enable compression where CPU resources allow data volume to be shrunk. To avoid overloading systems or links, set sensible limits on compression activity and, where possible, throttle these tasks during peak hours to prevent system or network saturation.
Step 4: Apply the right method to the workload
This step lets organizations achieve optimized throughput, minimal transfer failures, and a balanced load on network and storage systems.
📌 Use Case: An IT administrator finds that file-level backups of large virtual machines take too long and often fail mid-transfer. By switching to block-level backups for these files, they achieve faster and more consistent job completion across environments.
Backup and replication performance depend on matching the protection method to the data profile:
- Block-level backups: Ideal for workloads with frequently edited files. They track changes at the block level, which reduces transfer size and improves efficiency.
- File-level backups: Suited for systems with simpler metadata and fewer files. File-level operations are easier to manage and simplify restores for individual files.
- Roaming endpoints: Combine block-level backup with intelligent bandwidth controls and resume-safe laptop upload capabilities. This ensures data integrity and consistency.
Step 5: Prove restore speed and integrity
This step ensures teams can align technical outcomes with business goals by restoring speed and data integrity.
📌 Use Case: A service provider completes several tuning rounds to improve backup efficiency, but faces client skepticism about real-world impact. They provide clear evidence of improvement by running tier-based restore drills, documenting RTO performance, and validating restored application health.
Scheduled restore drills
Perform restores by service tier to measure RTO and confirm readiness. In addition to file recovery, verify checksums and confirm application-level health.
Compare key metrics
Track key performance indicators before and after optimization to see performance gains. Some key metrics include job time p95, transferred data volume, chain depth, and RTO deltas.
Audit and reporting
Store restore drill results and validation evidence with support tickets. Doing so supports compliance reviews, audit trails, and client Quarterly Business Reviews (QBRs).
Step 6: Operate, observe, and improve
This step validates that restored data is fast, turning backend improvements into client value.
📌 Use Case: After refining deduplication and backup policies, a provider wants to confirm results. They conduct quarterly restore drills, verify recovery times against SLAs, and document application checks. The reports show measurable RTO improvements and clean integrity checks.
- Run restore drills: Schedule restores by priority to measure RTO performance. Confirm file integrity through checksum validation and verify that apps start normally after recovery.
- Measure and compare KPIs: Track pre- and post-tuning metrics to show improvement in throughput and reliability.
- Document and store evidence: Save test results, log, and report with related tickets for audits, SLA reviews, and QBRs.
NinjaOne services that help tune block-level backups
NinjaOne enables MSPs to optimize and scale block-level backups through intelligent automation, flexible storage, and deep performance insights.
Bandwidth-optimized backups
NinjaOne uses change-aware transfers and smart scheduling to minimize bandwidth consumption. Features like intelligent deduplication, block-level change tracking, and chainless backup technology ensure efficient data movement without compromising restore speed.
Hybrid targets
With hybrid backup capabilities, NinjaOne supports both local and cloud storage destinations. MSPs can configure granular retention settings, perform hourly cloud syncs, and design storage strategies that balance speed, cost, and redundancy.
Chain health and alerts
NinjaOne provides backup chain monitoring for complete visibility into chain depth, job health, and storage capacity forecasting. Automated alerts help detect issues early, keeping backup sets reliable and restorable.
Restore testing automation
Automated restore testing allows MSPs to effortlessly validate recovery integrity. With minimal manual effort, they can schedule sandbox restores, verify restore integrity, capture automated screenshots, and even attach restore evidence to tickets.
Reporting
NinjaOne’s comprehensive backup dashboards deliver actionable insights, including Transferred GB per job, P95 job time metrics, and chain depth visualization. These reports help identify trends, optimize performance, and ensure compliance across client environments.
Tune block-level backups to ensure your operation plans scale
Block-level backups are better when you manage chain health and prove outcomes with restores. To ensure faster jobs, smaller windows, and demonstrable restores, it’s recommended to baseline your change rate, choose the appropriate chain strategy, and report a small KPI set.
Related topics:
