2019 is the year of NVMe

2019 is the year of NVMe

Chris EvansAll-Flash Storage, NVMe, Storage, Storage Hardware, Storage Media

We’re on the eve of Flash Memory Summit 2019, an event that will talk about Flash storage products and applications.  At the core of the discussion is NVMe, Non-Volatile Memory Express.  Without sounding too clichéd, 2019 looks to be the year of NVMe, as the technology moves to become the leading storage protocol for the enterprise.

What is NVMe?

NVMe or Non-Volatile Memory Express is a new protocol that allows servers to talk to storage devices like solid-state disks.  Within a server, NVMe SSDs connect to the PCIe bus using a range of different form factors.  Across a storage network, NVMe-enabled devices communicate to a server using NVMe over Fabrics, or NVMe-oF.

We can compare NVMe to historical SAS, SCSI and SATA storage protocols.  Within a server, these solutions required a controller that can be implemented directly on the motherboard or through an add-in card.  Across the network, Fibre Channel and iSCSI allow a server to communicate to shared storage.  This aligns with the capabilities of NVMe-oF.

The standards for NVMe are developed by NVM Express Inc, formerly the NVM Express Work Group.  NVMe specifications cover NVM Express (currently at release 1.4), NVMe over Fabrics (release 1.0a) and the NVMe Management Interface (release 1.1).  You can listen to founder Amber Huffman talking about the establishment of NVM Express Inc in this Storage Unpacked podcast.

Why Do We Need NVMe?

The need for NVMe has been driven by the performance of new solid-state media.  Historically, storage was (and still is) the slowest component in computer architecture.  Hard drive response times are measured in milliseconds, whereas solid-state disks have latencies around 30-80 microseconds (around two orders of magnitude faster). 

As hard drives were so slow, the efficiency of the software stack (SAS/SATA) wasn’t important.  Flash device latencies are so much lower than HDDs that a significant portion of an I/O request time using SAS/SATA drives occurs in the protocol stack.  To get the best out of NAND flash and future solid-state technologies, a more efficient storage protocol was needed.  Enter NVMe.

Benefits

NVMe provides much more than a faster I/O stack.  By connecting directly to the PCIe bus, NVMe eliminates the overheads of a storage controller.  SAS/SATA protocols dictated only a single queue for I/O (with spinning media, multiple queues didn’t make much sense).  NVMe provides for up to 65535 queues, each capable of holding up to 65535 requests. 

It’s unlikely any single system will use the full 64K capability.  However, with new features such as Open Channel and Zoned Namespaces being added to the NVMe specification, NVMe SSDs will be able to handle far greater amounts of parallel and typically disparate workloads.  These features are essential to fully optimise large-capacity drives of 32TB and higher.

Today we see a mix of NAND flash drives and some Storage Class Memory devices using Intel Optane (3D-Xpoint).  Vendors are starting to bring “fast flash” products to the market as well as other SCM products that use features such as ReRAM (a topic for another post). 

Storage Networking

So far, we’ve talked about the benefits that apply to NVMe drives installed into servers or storage appliances.  NVMe technology is also being applied to storage networking, as an evolution of existing storage protocols like Fibre Channel, iSCSI and FCoE. 

NVMe over Fabrics (NVMe-oF) uses multiple transport layers to move NVMe requests between a host and a storage target.  These include Ethernet (implemented as RoCE), Fibre Channel and InfiniBand.  The NVMe-oF specification was recently extended to include TCP/IP as a transport layer, allowing the use of generic Ethernet NICs to transport NVMe requests.

Compared to Fibre Channel and iSCSI, NVMe-oF allows for a more direct communication channel to an NVMe device.  This has created some interesting new shared storage designs that operate more efficiently than the traditional protocols (more on this later).  This podcast, recorded with Mark Jones from FCIA provides some more insight into Fibre Channel and NVMe.

Standardisation

The performance benefits of NVMe are clearly obvious.  Adopting NVMe-oF will create an environment where the underlying transport can change, while the point-to-point protocol remains the same.  For many years we’ve seen attempts to dislodge Fibre Channel as a networking protocol.  iSCSI brought us cheap storage networking that was great for small/medium businesses but didn’t have the security and robustness the enterprise demands.  Fibre Channel over Ethernet (FCoE) tried to standardise the data centre on Ethernet but for many reasons failed to take hold. 

Western Digital NVMe drives (AIC & U.2)

NVMe may well be the uniting technology that finally standardises storage networking, especially in public and private clouds that implement disaggregated and composable architectures.  This is because it provides a standard protocol, irrespective of the transport mechanism.  IT organisations can transition to NVMe using, for example, their existing Fibre Channel technology, then choose whether to remain on Fibre Channel or migrate to Ethernet at the next hardware refresh.

Adoption

NVMe is being adopted across the entire data centre, as well as in the consumer market.  Laptops have been migrating to SSDs for many years and now we’re seeing the transition to NVMe connected devices.  SSDs using the M.2 format (gum stick) are as cheap as $0.20/GB (or less) to retail customers.

The default for server installations is rapidly moving to use NVMe SSDs as a boot device.  This can be using M.2 or U.2 devices (which look like traditional SSDs and are hot-swappable).  Manufacturers are already moving towards the U.3 format which will provide backwards compatibility for SAS and SATA drives in the same chassis.  You can learn more about new form factors in this podcast with Jonathan Hinkle.

Systems

At the systems level, vendors are implementing a dual-path strategy.  Within storage systems, NVMe is being gradually adopted as the internal protocol for storage.  Scalability of these types of systems is limited typically to around 24 drives per server, due to the capabilities of current PCIe switches and available PCIe bandwidth. 

Front-end connectivity is moving to a mix of high-performance Fibre Channel and NVMe over Fabrics.  Fibre Channel remains much more mature than NVMe-oF, however, NVMe-oF offers the potential for lower latency than Fibre Channel and simplified architectures.

Systems Development

The transition to using NVMe internally isn’t a trivial task for storage systems vendors.  We last saw a similar evolution in architectures with the adoption of NAND flash storage to create hybrid and all-flash systems.  SSDs couldn’t simply be slotted into existing architectures because internal software stacks within the storage platforms weren’t designed or efficient enough to manage the new media.  Vendors had to rewrite their storage operating systems to be much more efficient in the I/O path and get the best value offered by SSDs.

NVMe represents the same challenge but also represents an opportunity.  Eliminating HDDs from storage systems architecture allows vendors to deliver more efficient code in the I/O path.  Moving from SAS/SATA to NVMe also provides efficiencies as device connectivity and recovery can be vastly simplified. 

Adoption Strategies

How should you be looking to adopt NVMe?  At a server level, moving to NVMe offers the ability to improve boot times for bare-metal servers and use NVMe within HCI solutions that can either fully exploit NVMe for active data or as a cache.

At a systems level, vendors are starting to introduce products that are fully NVMe-capable or use NVMe tactically as a caching feature for active data or metadata.  The decision to move to NVMe-capable systems becomes a cost/benefit analysis.  Will the increase in performance be justified by any additional cost? 

NVMe over Fabrics

The move to adopt NVMe over Fabrics perhaps needs a little more consideration.  NVMe-oF isn’t widely supported (yet) and there are still some gaps in the protocol that need addressing.  Like all storage networking solutions, the adoption of new technology invariably requires new hardware.  Some Gen5 (16Gb) and Gen6 (32Gb) hardware can use NVMe-oF today, but many businesses may not be using this technology.

Moving to a new protocol naturally requires new hardware, but also demands new skills and can change existing processes.  Until widespread NVMe-oF support is available, Fibre Channel remains the best solution for storage networking.  This is, in part, because FC/NVMe can be supported on the same infrastructure as Fibre Channel using existing Fibre Channel protocols. 

However, it makes sense to consider a move to NVMe-oF for any projects that can justify the higher latency and performance.  This offers the ability to gain experience of NVMe-oF and learn how it will integrate into existing processes and procedures (including automated deployments). 

The Architect’s View

NVMe is here to stay and the undoubted successor to SAS and eventually SATA in the data centre.  The decline of SAS/SATA in consumer devices will likely occur even quicker than the enterprise.

With the ability to be network-agnostic, perhaps the biggest benefit of moving to NVMe will be to drive the adoption of composable infrastructure.  Over the past 20 years, we’ve benefited massively from storage networking, however, traditional Fibre Channel now represents a performance overhead and still needs manual configuration (bar some exceptions). 

NVMe and NVMe-oF will allow the adoption of more flexible and dynamic infrastructure deployments that remove the human out of the configuration equation.  This is a topic we’ll cover in more detail in another post.

In the meantime, don’t forget to download NVMe in the Data Centre 2.0, which is still free (registration required) and goes into more details on the topics covered.  This report will be amended and released as version 3.0 (which will then be paid premium content) at the end of August, after updates from Flash Memory Summit 2019. 

Also check out our NVMe Microsite, with links to content covering everything about NVMe.


Post #dd24. Copyright (c) 2019 Brookend Ltd. No reproduction in whole or part without permission.