The All-Flash Backup Fallacy

The All-Flash Backup Fallacy

Chris EvansAll-Flash Storage, Data Practice: Data Protection, Data Protection

Earlier this month, we briefly looked at the future of modern data protection solutions.  We highlighted cost efficiency as one requirement.  Can an all-flash secondary storage design meet the efficiency rule, or do we need to think outside the box and look at the holistic view of a hybrid system?

Background

Back in 2013, I worked with IBM to promote the combination of ProtecTIER and FlashSystem as a high-performance data protection solution.  The premise at the time was simple; de-duplicating appliances create highly random workloads, so the only way to achieve fast restore performance was to use all-flash storage. 

Fast-forward to 2022, and the storage world looks a lot different to those technologies of the early flash days.  Modern flash systems almost exclusively use TLC or QLC NAND, offer large capacity drives, while costs have dropped substantially.  So, the idea of an all-flash secondary storage platform makes even more sense today, right?  Perhaps not. 

Secondary Storage

Let’s take a moment to look at how the storage market has actually evolved since those initial all-flash backup platforms of the early 2010s.  Flash SSDs have peaked at 30TB capacities, unchanged since Samsung announced the PM1643 four years ago.  This limit is probably due to the unit cost of top-end drives, which, based on UK street prices, is about $300/TB, or $9000 per drive for 30TB models.  At $9000 each, customers and vendors alike will want to ensure failed drives are replaced under warranty.  At the same time, seeding a system with a basic configuration of eight drives represents $72,000 in drive costs alone.  Although some of the challenges associated with drive lifetime are being addressed (see this post on ZNS), vendors have placed recent focus on PCIe 4.0, improved reliability and performance. 

In the hard drive market, 20TB drives are already available, with a street price of around $700 or around $35/TB.  High-end SSDs are still approximately 8x the per GB cost of capacity HDDs.  As the chart from Intel in this post shows, any kind of device price parity isn’t coming soon (although the TCO calculation is more nuanced).  The capacity HDD market is also growing, with predictions of drives that have tens of terabytes of capacity only a few years away.  To manage performance and internal challenges, vendors have introduced new technologies like dual actuators, zoned recording/SMR and integrated NAND in the drive controller.  We’ve also seen an NVMe HDD in preview.

Tiering

In the secondary storage market, the concept of tiering continues to have value, as the ratio of price between SSD and HDD remains substantial on a $/GB basis.  At scale (by which we mean around the petabyte mark), some all-flash solutions could be cost-competitive (and we’ve talked about these in the past).  However, for many organisations, tiering will remain a logical approach, especially with considerations like ransomware.  The reasons for this are obvious; secondary data becomes more inactive over time.  Restores generally occur from the most recent backups because data concurrency is important.  As backup data ages out and isn’t accessed, tiering down to low-cost storage is a cost-efficient design.  However, the scourge of ransomware has modified that strategy somewhat.

Ransomware

Prior to the challenges introduced by ransomware attacks, secondary data in the form of backups followed a well-understood ageing pattern.  Backup data moves from active to inactive over time, with the relative importance of a restore reducing in line with activity.  Generally speaking, historic restores are not time-dependent for production workloads, although data recovery for legal and compliance reasons is still important. 

The impact of ransomware has changed the data protection landscape.  The time between infection and activation of a ransomware attack could be as long as 300 days, which increases the importance of long-term backup data.  Furthermore, if a ransomware attack requires recovery from backup, the recovery process may need access to large amounts of backup data to do a full restore of all applications.  This requirement means archived backups like those on the public cloud or moved to tape could have an impact on recovery time objectives and internal service level agreements.

Challenges

So, here are the challenges for today’s IT organisations.

  • Maintain cost efficiency across secondary data.
  • Enable greater retention of data for recovery purposes.
  • Make recovery possible from any historical backup with consistent RTO.

While these requirements could be met with an all-flash solution, most businesses will find it hard to justify placing massive amounts of inactive data onto all-flash systems.  Instead, cost-effective disk hybrid systems will be the answer. 

As a side note, while tape is a great low-cost medium and has the advantage of a physical air gap, tape restores probably can’t meet the service level requirements needed to achieve full ransomware restores. 

Of course, a hybrid system needs to have the capabilities to understand the needs of data restore and not fall back to the slow backup appliances of the early 2000s that IBM was looking to solve with ProtectTIER. 

This means having the capability to restore any historical backup at the same performance as a backup taken last week.

The Architect’s View™

We like solutions such as StorONE S1:Backup for a number of reasons.  First, the technology abstracts storage pools across multiple media types.  Second, systems can be expanded with a mix of flash, HDD and storage-class memory (SCM).  The features offered in software are separated from hardware, so over time (the long timescale of backup), the hardware expands and changes without affecting the data being stored.

At scale, all-flash might work for you, but for most IT organisations, the hybrid model for secondary data will continue to have value.  However, the depth of that value comes from software-enabled features and how they exploit the hardware capabilities. 

For more details on the topics raised here, check out our dedicated Data Protection Microsite and reports from our Data Protection Practice.  More details on Purpose-Built Backup Appliance futures can be found in this post.    


Copyright (c) 2007-2022 – Post #f4e3 – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission. StorONE is a client of Brookend Limited