Looking forward to the future of modern backup storage

Looking forward to the future of modern backup storage

Chris EvansData Protection

In the last few weeks, three primary storage vendors have released new or improved solutions for data protection, specifically acting as targets for traditional data protection software.  We are experiencing an evolution in the requirements for secondary backup storage, driven by issues such as ransomware.  We look at the market for backup storage and briefly at each vendor, discussing whether the era of the legacy PBBA may be coming to an end.

Background

From the early days of computing, data protection exploited tape media as the target for backups.  Tape is relatively cheap and efficient but operates as a sequential medium.  The backup process works well with tape systems, and over the years, vendors built-in automation (tape libraries), multi-threading (parallel streams) and other features to best use the throughput and capacity capabilities of tape. 

Unfortunately, the restore process from tape can have significant challenges.  A single restore may use a full copy backup and incremental data, possibly with the “incremental forever” methodology.  If a tape is lost or damaged beyond use, then this strategy can result in an inability to restore.  With many parallel restores in process, contention for drives and media is created, resulting in significant bottlenecks.

VTL

In the early 2000s, Data Domain pioneered deduplicating storage systems, with the capability to ingest and reduce the physical volume of data stored compared to the logical size written to the appliance.  Depending on the data profile, reduction levels can represent 95% or greater.  Both NetApp and EMC wanted the technology, with EMC winning the bidding war and acquiring the company in 2009.

Deduplicating appliances initially offered VTL interfaces (which had already existed for some time), emulating traditional tape drives.  This standard has been gradually replaced with NAS protocols and now object storage interfaces. 

Dedupe

As with tape, deduplicating appliances generally focused more on the speed of ingest than the performance of restore.  Deduplication creates a stream of highly random data, which, when read back, is also accessed randomly, even if the requested I/O stream is sequential.  When data is stored on disk, recovery performance from deduplicating appliances can be slow.  Vendors have looked to address this problem by caching metadata and using all-flash behind deduplicating software as IBM tried with their IP from the acquisition of Diligent Technologies (ProtecTIER and FlashSystem in 2014).

Object

The rise in adoption of the public cloud has resulted in backup software vendors offering object storage as a target location for data.  This design enables on-premises backups to be offloaded to the public cloud or to build on-premises backup storage capacity using object storage. 

Object stores have the characteristics of good scalability, good parallel access (both read and write), generally low cost (as the initial implementations were based on HDDs), good resiliency through erasure coding and geo-distribution capability (good for disaster recovery and mobility). 

Two challenges for object storage are deduplication, which was generally not offered (and in many cases still isn’t available today) and granularity of file sizes.  The deduplication issue can be solved with gateway software (as Pure Storage did with the acquisition of StorReduce) or by using deduplication within the data protection software. 

Modern Requirements

The requirements of modern data protection solutions have evolved due to the changing landscape of IT.  Ransomware has introduced the risk of data loss or corruption, while governance and regulation require businesses to have the capability to recover historical data on demand.

Every backup counts, in contrast to the way data protection was treated a few decades ago.  The ransomware challenge means a business may have to recover its entire data footprint from backups rather than relying on snapshots or replicated copies.  This requirement puts tremendous strain on the ability to meet recovery time and recovery point objectives if every business owner wants their data back first.

Modern secondary storage for backup must meet new requirements, including:

  • Immutability – the ability to time-lock content from deletion or change, irrespective of the security access of an administrator.  
  • Hardened systems – hackers target backup systems to expire snapshots, change time clocks and otherwise delete historical backup data.
  • Cost efficiency – immutability introduces the need for additional backups to be retained, so systems must use media efficiently.
  • Data reduction efficiency – data reduction is essential to reducing costs, using either compression, deduplication or both techniques. 

Vendor Solutions

Three vendors, StorONE, VAST Data, and Infinidat have recently released solutions for data protection.  In each case, the products work with backup software and act as a target for backup data. 

InfiniGuard

Infinidat has upgraded the InfiniGuard platform to deliver greater throughput and faster restore capability.  InfiniSafe software features include immutable snapshots, a “logical” air gap for both local and remotely replicated data, and the ability to perform fenced network restores.  We will cover InfiniGuard separately in another blog post and have a podcast coming up soon that discusses the new features.

StorONE

We briefly reviewed S1:Backup in December 2021 (see this post for details).  S1:Backup uses the same technology as primary storage from StorONE, with a combination of persistent memory, flash and hard disks.  The architecture places data initially into persistent memory/flash before consolidation and striping across cheaper media (in this case, high-capacity hard drives).  The architecture of the S1 platform delivers fast read and write, although the system doesn’t implement deduplication. 

VAST Data


VAST Data has announced a partnership with Commvault (and others to come) to use Universal Storage as the secondary storage backend to existing data protection software.  The Universal Storage architecture offers high scalability, high throughput, high parallelism, and strong data deduplication capabilities.  This includes the “similarity” deduplication capability, which can increase data reduction savings on top of the reductions achieved by the backup software.  We recently published a podcast with VAST Data that discusses the rationale and benefits of using Universal Storage as a secondary data solution.  The podcast is embedded here.

Whither PBBAs?

Data protection solutions are composed of three main components – schedules, which determine the backup requirements; metadata, which stores the state of backups and the backed-up data itself.  Fully integrated solutions include all three core aspects, while simple deduplicating appliances generally only store data and their own internal metadata.  For smaller businesses, self-contained PBBAs are a good option, where data protection can’t be moved to a SaaS/Cloud model.  For enterprises with large volumes of growing and potentially dispersed data, the PBBA model is more challenging and can lead to “backup sprawl”. 

All three vendors discussed here have the capability to centralise backup data, either as a direct target for vendors such as Commvault or HYCU or alternatively to act as the backup archive layer for scale-out PBBAs.  One benefit gained from separating the data protection software from the backup repository is the ability to accept data from multiple platforms and sources, effectively creating a secondary storage data plane.

The Architect’s View™

This article is by no means an exhaustive analysis of the data protection market.  However, as we approach the refresh of our data protection report, we are highlighting the trend in the direction the backup market is taking.  Traditional PBBAs and deduplicating appliances are underperforming the new crop of products and vendors (Dell PowerProtect DD, for example, maximum throughput is 41TB/hour or 94TB/hour with DD Boost, an additional chargeable software component; Infinidat is now quoting 180TB/hour for InfiniGuard).  The metrics of space, power, cooling, capacity, throughput, efficiency and cost will dictate the winners and losers in this quickly changing landscape. 


Disclaimer: VAST Data, StorONE and Infinidat are all clients of Brookend Ltd and tracked vendors in our Data Storage practice.

Copyright (c) 2007-2022 – Post #e4ee – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.