Dude, Here’s Your 100TB Flash Drive!

Dude, Here’s Your 100TB Flash Drive!

Chris EvansAll-Flash Storage, Storage Media

It’s been only a month since I wrote about Samsung’s 30TB solid state disk and already the bar has been set higher.  This week Nimbus Data announced a 100TB 3.5″ SSD (the DC100) that blows past the proof of concept 60TB drive demonstrated by Seagate only a few years ago.  How real is 100TB in a single drive, what could it be used for and who will pay for it?

Nimbus Data

Nimbus has been in business for a number of years, focusing on all-flash storage arrays.  My first introduction to the company was at Storage Field Day 1 (part of the Tech Field Day event) in 2012.  Check out the videos and you’ll see my balding pate in front of the presenter, Tom Isakovich.  I’ll be polite and say the company has had a chequered history (a term I’ve used before).  At that presentation in 2012, Nimbus was displaying their own designed and built flash drives.

Build or Buy

It’s an eternal question, should you build systems or buy them?  The same question applies to solid-state drives – is it better to build systems from components or build out custom modules that bypass some of the deliberate constraints that are built into the SSD form factor?  I wrote about the problem a couple of years ago, after seeing presentations from Pure Storage and Violin Memory.

Pure told us that their strategy had been to use commodity SSDs in order to get products to market quickly, after which they moved to the custom model.  This makes sense in their design of FlashBlade where everything is deployed on a custom hardware module.  Violin has been using custom VIMMs since the inception of the product, but has perhaps had issues with the logical layout of the platform as VIMM capacities increased.

To my knowledge, different sized VIMMs can’t be mixed in the same Violin chassis.  This certainly becomes an issue as new technology like TLC and QLC come to market – an array with distributed capacity/performance would take significantly more management or have compromises.

Nimbus SSD

So, Nimbus chose to build SSDs that fit the standard drive form factors.  Originally in 2012, we were being shown 100GB, 200GB and 400GB models (pictured here).  Six years later, capacity has multiplied somewhat to 50TB and 100TB drives.  The ExaDrive isn’t a new product though.  I wrote about the drives last August when the 25TB and 50TB models were released through Viking and Smart Modular.  At the time, the performance figures were 472MB/s read and 325MB/s write, with 58K and 15K random read/write IOPS.  The latest drives have improved over this, with 500MB/s throughput (read and write) and 100,000 IOPS read and write.  Again, there are no latency figures quoted.  Endurance is unlimited (more on that in a moment) and reliability is an industry standard 2.5 million hours MTBF.

MAID for SSD

Assuming 512GB chips are being used in the ExaDrive, each unit must be using 200+ chips to get to 100TB capacity, with over-provisioning.  The most obvious question here is how this can be achieved within standard power and cooling specifications.  The drives are rated at 0.1W per TB.  The answer seems to be in a patent-pending multi-controller architecture, which I’m guessing is fully powering the NAND chips when they’re needed and powering them down when not.  This is a SATA drive, so at 500MB/s, writing the entire drive would take some 55 hours.  With some fancy caching, it would be possible to power down entire sections of the NAND, although I don’t know what impact that has on the endurance of the media.  Perhaps we should call the technology Massive Array of Inactive Dies as an homage to tech that has gone before.

Unlimited Endurance

Let’s touch back on the endurance claim.  Typically, vendors quote either DWPD (Device Writes Per Day) that cover the number of times an entire device can be written, or TBW (Terabytes Written), an absolute amount of write capacity.

The ExaDrive is being sold with no endurance restrictions over a 5-year warranty period.  Now, based on the throughput capability, it would take 55 hours to write a 100TB drive in it’s entirety, so implicitly, the ExaDrive DC100 has a de-facto of 0.43 DWPD.  It would be interesting to know what the TBW figure is, as this would indicate how much longer the drive could survive after the warranty period is reached.

Use Cases & TCO

OK, so we have a 100TB drive, which when we look at the figures, is biased towards large-scale flash capacity, rather than ultra-high performance.  We’ve seen high density before, with SanDisk Infiniflash three years ago that offered 512TB in 3U.  Most recently Intel announced the ruler form-factor to deliver 1PB of capacity (eventually) in 1U.  I haven’t seen any environmental specifications for the ruler format, so it remains to be seen how this stacks up to the DC100.  However, Nimbus are suggesting that one rack of 45U could hold 100PB using the DC100, which is 2PB per rack unit.  With only 990 drives, the administrative overhead of using DC100 could be much lower than with comparable solutions.  As usual, TCO starts to play a big part here.

The Architect’s View®

Packaging 100TB into a single 3.5″ drive isn’t that remarkable.  Keeping the power consumption low, while still implementing standard SSD functions like garbage collection and wear levelling, is however very interesting.  The value here is clearly the multi-processor design and the patent-pending technology that distributes I/O across multiple controllers and dies.  Today DC100 is still based on MLC technology (from SK Hynix and another undisclosed partner), so there is room for significant capacity improvement with TLC, 3D-NAND and eventually QLC.  Previously Nimbus stated that a 500TB drive wasn’t far away.  Six months ago that seemed crazy, not so much now.

This product isn’t an enterprise drive replacement and I don’t think we’ll see it in standard array deployments.  Even when the NAND makes up most of the BOM, implementing multiple controllers adds extra cost, so the design is logical only for very large capacity drives.  This hardware will see adoption with hyper-scalars and the likes of FaceBook, Apple and Google where the specific demands around TCO are so important.

Remember in Disks for Data Centers how Google highlighted desirable features in HDDs that would improve TCO and eliminate some performance issues (like tail latency)?  DC100 could well be the product to solve Google’s problem – if the price is right.  I’ve been told that pricing is likely to be competitive with existing flash, perhaps in the 50-60¢/GB range.  So, anyone want to place a bet on when the 500TB drive will arrive?

Further Reading

Copyright (c) 2007-2019 – Post #B05A – Brookend Ltd, first published on http://www.architecting.it/blog, do not reproduce without permission.