We’re currently about ten years into the transition to all-flash storage that started with the introduction of SLC media into EMC DMX in 2008. Vendors have been keen to adopt all-flash systems as the ultimate replacement to traditional storage. However, as we watch new media enter the data centre, it’s time to look again at the idea of hybrid storage systems.
The concept of tiered or hybrid storage has existed for almost as long as we’ve had shared storage platforms. This blog post gives some idea of the stages of evolution, as vendors have ameliorated and adjusted their storage solutions.
Ultimately, tiered storage is a cost/benefit equation. Fast media is expensive, while only a fraction of data is ever active at one time. For enterprises running traditional applications like databases or LAMP stacks, most data will be inactive. This situation is even more apparent when we look at unstructured content, except perhaps for the processing of data for ML/AI purposes.
The idea of all-flash storage is a great marketing story. Gone are the days of having to worry about the issues of uneven I/O response times. All of the data sits on a single, uniform tier, offering consistent performance. This concept is great – if you can afford it.
- The Four Stages of All-Flash Storage
- Enterprise Computing – Death of Tiering?
- #54 – Are we at All-flash and HDD Array Price Parity?
When the choice for the enterprise was between all-flash and a hybrid solution based on SLC/MLC and hard drives, the chasm in performance between I/O latency from HDD versus that of SSD was so great that hybrid platforms sacrificed performance against cost. The idea of tiering all-flash arrays didn’t make sense, as the initial gains from going SLC to MLC were all achieved in marginal cost savings. We’ll come back to that aspect in a moment.
Legacy hybrid solutions have many challenges.
I/O consistency. We’ve discussed this already, but it’s worth re-emphasising the point. Enterprise-grade 15K hard drives offered perhaps 200 entirely random IOPS and 250MB/s sequential throughput, compared to flash with hundreds of thousands of IOPS and maybe 500-600MB/s of throughput with any workload. If your data is in the flash tier, you’re golden. Find yourself in the HDD tier, and things aren’t so great. It’s even worse if your data spans both tiers, as you will have unpredictable performance results.
Retro-balancing. Almost all tiering or hybrid solutions rebalance data with historical I/O profile information. This process means storage systems are always playing catch-up with the application and never quite deliver the benefits of the underlying media. When data is moved in and out of tiers simply to rebalance workload, then I/O capacity for the host is potentially lost and again, doesn’t fully exploit the media capabilities.
Ratio management. Getting the percentages of each tier right is a big problem. Most legacy storage appliances (and oddly, some new ones) have tiers or pools based on a fixed RAID size. Expanding pools can be expensive as many platforms don’t offer the ability to add individual drives.
The initial wave of all-flash systems was based on expensive SLC storage. This quickly evolved to cheaper MLC, and today we see vendors introducing TLC and QLC flash into their products. The interesting aspect of the transition from SLC to QLC media is the reduction in unit cost ($/GB), but we also see a parallel decrease in endurance and an increase in I/O latency.
As NAND flash technology matures, we have a set of offerings that’s analogous to the HDD market, where performance, capacity and cost all play a factor. The storage hierarchy has expanded to meet the needs of multiple application profiles but at the expense of endurance.
Endurance is the Achilles Heel of NAND flash. Writing data to NAND media wears it out, and high capacity QLC drives have much lower endurance than the original SLC products. Fortunately, we’ve seen some amazing work by NAND and flash drive vendors that use error correction and other algorithms to extend the endurance of SSDs.
- Using SSDs for Write-Intensive Workloads
- The Expanding Storage Hierarchy
- What is Intel Optane?
- Persistent Memory in the Data Centre
The latest media option on the market is, of course, Intel Optane or 3D-XPoint. Optane has no endurance worries compared to NAND flash. The technology is also fast, putting it between flash and DRAM on the storage hierarchy.
If we’re to believe the rumours, PLC or five bits per cell is just on the horizon and will have an even lower endurance level, albeit with a modest gain in capacity.
Taking all of these factors into consideration, we’re going to see a renaissance in the hybrid storage platform. The first will be the Solid-State Hybrid storage platform, which uses Intel Optane and multiple tiers of NAND flash. We’ve highlighted these solutions before, in the architectures implemented by StorONE and VAST Data and no doubt other vendors will follow suit.
Tiering makes sense where there’s enough differentiation in cost between multiple media types to make a hybrid solution financially viable. Solid-State Hybrids remove or mitigate most of the issues of traditional hybrid solutions by using fast media across all tiers. The big winners here will be the architectures that can exploit the throughput of cheap flash without being compromised by the endurance.
Another option we’re likely to see (and to an extent have seen already) is the multi-media hybrid that uses any and all of the available media types. As unstructured data grows, low-cost hard drives will still have a part to play. On-premises object stores offer a better cost profile than the public cloud, making it financially viable to continue to store large volumes of data on-premises. Of course, there are many scenarios where the cost of flash media isn’t justified, such as replication of data for test/development or partial processing of content such as in Media and Entertainment. So, we can expect to see the hard drive in place for quite a few years to come.
The Architect’s View
There’s an inevitable ebb and flow in the design of storage systems. All-flash was a great solution when we didn’t have multiple tiers of flash products available. Looking back at this post from 2010, NetApp’s CEO predicted the death of tiering entirely. Now the picture looks very different.
Centralised storage offers the advantage to optimise costs by using many media types in a shared platform in a way that can’t be achieved across distributed solutions. However, the platform architecture has to support new media in an efficient manner. At scale, this offers a design that locally attached storage can’t beat, keeping both shared and hybrid arrays relevant in today’s market.
The future is still definitely hybrid because the cost of storage media is always a consideration. The winning solutions will exploit that cost profile in the most efficient way.
Copyright (c) 2007-2020 – Post #027a – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.