Persistent Memory in the Data Centre

The traditional model of computing in use for the past 70 years is based on an architecture developed by John von Neumann in the 1940s. Computers execute instructions and operate on data in memory, while programs and inactive data are stored on a peripheral device or external storage. New storage devices have blurred this boundary, offering benefits for applications and storage array vendors alike.

External Storage

Peripheral devices such as hard drives and SSDs initially connected to a server or computer through a storage controller. The controller itself connects directly to the CPU, today generally using the PCIe bus. The job of the storage controller is to convert I/O requests into reading and writing blocks of data to/from physical media. As hard drives are relatively slow compared to main memory, the storage controller also acts as a caching device and implements data protection features such as hardware RAID.

External media attaches to the storage controller through traditional storage protocols such as SCSI, SAS (Serial Attached SCSI), ATA and SATA (Serial SATA). The use of a storage bus provides for expansion capabilities and advanced functionality like dual pathing and hot-swap.

NVMe

Today’s modern media operates at higher throughput and lower latency than ever before. Hard drive performance is measured in milliseconds, whereas NAND flash storage operates around the 100 microseconds mark. Intel Optane (which we will discuss in a moment) can reach as low as 10 microseconds.

At this level of performance, inefficiencies in the storage I/O protocols start to appear. The storage industry has resolved these issues with the introduction of NVMe. The NVMe specification physically connects devices directly onto the PCIe bus, bypassing the need for a storage controller. The NVMe protocol is optimised to work with high-performance devices, introducing much greater parallelism and significantly lower latency than SAS or SATA. You can read about the improvements in performance with NVMe in the following related blog posts.

New Media

In parallel to the development of NVMe, we’ve seen new forms of persistent storage media come to market. 3D-XPoint, jointly developed by Intel and Micron and sold by Intel under the brand name Optane, is a form of persistent memory that stores state using changes in electrical resistance. Neither Intel nor Micron have provided any detail on the specifics of 3X-Point. However, the technology is thought to be a form of Resistive RAM.

Persistent Memory

Optane differs from other forms of solid-state storage (such as NAND flash) in the way data is accessed and in improvements to specific physical characteristics. NAND storage reads and writes data in blocks similar to hard drives, typically around 4KB. Individual I/O operations occur at the block level. In contrast, Optane can read and write at the byte level, similar to the way load and store operations occur in memory.

The memory-like characteristic of Optane enables two implementation modes; either as a traditional NVMe SSD or as a persistent memory DIMM. The DIMM use case requires a specific Intel Xeon chipset and modified BIOS so is not a simple rip and replace. More details are available in the following blog posts.

MRAM

Another parallel technology is MRAM or Magnetoresistive RAM. MRAM devices use magnetic polarity to store state, in contrast to Optane with electrical resistance and NAND flash with an electrical charge. At present, MRAM devices haven’t encroached too much into the enterprise data centre, with limited deployments in other adjacent products. You can listen to more details on MRAM in this recent Storage Unpacked podcast.

Characteristics

Going back to Optane for a moment, we need to consider the specific characteristics of 3D-XPoint media as opportunities for deployment in the data centre develop.

Endurance – all media wears out over time and eventually becomes unreliable. The ability to withstand wear is known as endurance and typically impacted by write I/O.
Performance – the triumvirate of latency, throughput and bandwidth. Latency represents individual transaction time (how long it takes to execute a single I/O operation), while throughput and bandwidth measure raw “horsepower”.
Capacity – Now measured in gigabytes to terabytes.
Cost – Both $/GB and $/IOPS can offer comparison metrics for different media types.

Historically, we’ve used the “storage hierarchy” as a way to measure the relative position of media, from DRAM downwards. DRAM, for example, is expensive but has high endurance and very high performance. Optane sits below this with high endurance and high performance, but relatively high cost.

After that, we see a multitude of NAND flash technologies from SLC to QLC. Each of these offers increasingly lower endurance with lower cost and lower performance, but higher capacities. Economics is driving the Flash industry towards TLC and QLC as the standard NAND product for the enterprise. As vendors have learned more about I/O profiles and access patterns for enterprise data, these solutions have become more acceptable but still need careful management.

Exploiting the Hierarchy

With so much media choice available, how do vendors design products to optimise for cost and performance? Traditional techniques include using faster media for caching, either at the host level or within storage appliances. An alternative is to tier data across multiple media types, placing data on the most appropriate media for the cost/performance profile. We will cover the technical merits of caching vs tiering in a separate post, however, we can say;

Caching is a compromise as it assumes a “working set” of active data that is a small subset of the overall content. The cache can be overwhelmed if the working set is exceeded, at which point more effort is expended in managing the cache occupancy – or performance drops to the level of the “backing store”.
Tiering is more efficient but incurs an I/O overhead to move data between the tiers. Traditional solutions have had to make a trade-off between frequent data movement and incorrectly placed data.

New Algorithms

The characteristics of new media have demanded new ways to exploit the hardware in the best way possible. In servers, tactical use of Optane makes sense where an application can benefit from the ultra-low latency. High write-intensive applications needing data persistence over reboots could be a good example here.

In appliances, Optane is still too expensive for practical use as the only tier of storage. Most existing storage arrays would fail to make effective use of the low latency without significant code rewrites. Optane as a cache seems like a good move but using the technology as a read cache negates the benefit of 3D-XPoint write endurance.

The best solution is to use a hybrid of both tiering and caching. Two solutions on the market do just this. VAST Data uses QLC and Optane to build a highly scalable unstructured data store. StorONE has extended its S1 platform to support Optane as a tier of storage that complements another tier of QLC media. All write I/O hits Optane first. Over time, inactive data is cascaded down to QLC based on a high/low watermark process, which can take hours or days to reach.

We will touch on the StorONE S1 architecture in more detail in a separate post. However, for now, here’s a podcast recently recorded with StorONE CMO, George Crump.

The Architect’s View

The introduction of NAND Flash around 2010 required vendors to think hard about how to best use new media, balancing cost, performance and endurance characteristics. We’re in another inflection point with the introduction of persistent memory as vendors start to take advantage of that media’s unique properties. Expect to see the same challenges as before – solutions that can’t fully exploit the value of this hardware, leaving IOPS on the table. There will be a new set of solutions taking advantage of new media. Over time these solutions will win out, simply because they will use expensive resources in the most cost-efficient manner possible.