The scourge of ransomware means businesses need fast restores in the event of an attack. The best way to achieve this is to recover from a snapshot that’s only hours or even minutes old. Taking full image copy backups can be invasive, so data protection companies employ techniques like “incremental forever” or “virtual fulls” to meet SLAs. We dig into these strategies to explain what they mean and the best way to achieve fast backup and restore.
Data protection has typically followed a process that uses full-copy backups in conjunction with incremental data updates. For example, when the production week typically ran Monday to Friday, full backups might be taken over a weekend, followed by incremental backups during the week. The “full” creates an entire image of the application (typically a server), while the “incrementals” capture changes since the last full copy (the increments or differences). To restore data to a point after the full requires the full backup, plus all the incrementals to the point of the recovery.
Note that some systems offer the use of differentials, where the non-full backups capture everything since the last full (rather than everything since the previous backup). See figure 1 for examples of this.
When backups were written to tape, nightly full backups simply weren’t practical. Backup systems and networks couldn’t manage the data throughput. Compared to today, tape media was expensive, and the smaller media capacities required multiple tape changes and either manual intervention or tape libraries. Incremental backups take the load off the backup process but push the problem aside until a restore is required (more on this in a moment).
Taking the incremental approach further, why do we bother with full backups at all (after the first full)? Why not just take incrementals forever?
Well, only ever taking one full backup is both a blessing and a curse. From the backup perspective, the load on the backup system is massively reduced. If, for example, an application only changes 1% of data per day, a weekly full backup duplicates around 93% (or more) of the original content. So a continual incremental process is space-efficient.
However, when data needs to be recovered, the restore process could require tens (or hundreds) of tapes, perhaps multiple times, depending on the stacking of backup data on the media. This process represents a considerable challenge in meeting SLAs, especially if multiple restores are needed at the same time (which introduces media contention).
The most significant risk with the incrementals forever approach is the loss of any incremental copy. Backups are generally single copies, so if the data for an incremental is lost or corrupted, the entire backup process fails for any backup that occurred after the lost copy. (See figure 2)
World of Disk
Of course, we have long since moved on from tape as the primary backup medium, but the concept of full and incremental backups remains. The transition to server virtualisation evolved most backups into snapshots, with APIs to extract incremental changes (for example, vSphere Storage APIs). Suddenly, an incrementals forever policy looks more attractive. The backup system can create a full backup when a virtual machine is instantiated, then do incrementals forever afterwards.
Whether in the tape or disk world, incrementals forever requires a lot of metadata management by the backup software. All the individual changed blocks must be managed by the backup system and aligned to a date and time of backup. The risk of data loss remains, so the criticality of the backup storage media (and the metadata) is all important.
In a restore scenario, data will be aggregated from the initial full copy, plus all the incrementals needed to bring the backup up to date (or to the point in time of the restore). This recovery generates random I/O on the backend storage media, made worse when multiple restores are in play. This issue is well known in the data protection industry with de-duplicating appliances, especially those based on disk media, exhibiting this restore performance issues. This is why many vendors are now offering data repositories based on all-flash systems.
One solution to the risk of incrementals forever is the concept of the “synthetic full” backup. Rather than take repeated full backups (which result in large amounts of duplicate data), a synthetic full creates a new full backup from the last full backup and subsequent incrementals. There are several reasons why this process has benefits over an incrementals forever strategy.
- Data loss impact is minimised. The loss of any incremental copy can be minimised to only the last set of incrementals after the synthetic copy. Although the synthetic full could be comprised of the same data as an incremental forever image, a synthetic copy can be cloned to physically separate storage by the backup software rather than simply consolidating metadata pointers. This offers a chance to add more resiliency to the snapshot copy. At the back end, the data doesn’t have to be duplicated because the storage system adds in data protection for the new synthetic full.
- Metadata traversal is quicker. The metadata pointers that describe the latest backup image can be consolidated into “one pointer per block” rather than a series of chained pointers that could create a considerable metadata path. This makes multi-restore scenarios easier to manage (and potentially quicker) because much less metadata needs to be read into memory to process access to the synthetic full.
- The synthetic full can be treated as a logical entity. A synthetic full may be managed as a single entity, for example, moving the copy between tiers of storage or to another storage system entirely. This has the benefit of enabling a restored synthetic copy to be “warmed up” if required, moving it quickly to a fast tier of storage. If the backup system can convey this level of awareness to the underlying storage, then recovery times shorten. If the secondary storage platform has global deduplication, then backups can be moved quickly offsite, effectively only shipping changed data to create a new full synthetic copy in another location.
As shown in figure 3, synthetic fulls are more resilient to the loss of incremental data, but there are still risks. If an incremental is lost before the synthetic full is created, then the issues are the same as incremental forever. If a synthetic full has been created, then recovery from the lost incremental to just before the synthetic full is not possible, but subsequent recoveries are available.
The best policy available is to create a daily synthetic full immediately after the incremental completes. In this way, every backup is an effective full with no dependency on other backups (from the backup software perspective).
The Architect’s View®
Most of the implementation we’ve discussed here is the responsibility of the backup software. However, an efficient secondary storage platform is essential to ensure synthetic backup creation works smoothly and efficiently.
First, as each synthetic is created, the secondary storage needs to have the capability to deduplicate the data, eliminating shared blocks between each synthetic. Second, the secondary storage should be capable of doing the deduplication inline, so no data needs to hit backend storage, if it’s already part of a previous backup.
Although this process seems complicated, it’s effectively a metadata restructuring, optimised by the ability to instantly de-duplicate and discard repeated data that forms the synthetic copy. The backup software thinks every backup is unique and so recovery becomes much quicker and more reliable. The secondary storage does all the work, reducing the data capacity stored.
We can envisage a scenario where the backup software and secondary storage platform work together, to ensure that synthetic backup images are created, stored, and restored efficiently. Synthetic backups could even make backup data portability much more practical. This is an area where little to no work has been done over the past 20 years (something we discussed three years ago), and innovation is sorely needed. However, without efficient, deduplicating secondary storage, the process simply will not work.
Copyright (c) 2007-2019 – Post #237b – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.