In a recent press release, Western Digital announced the release of an interesting if not curious solution for extending DRAM with fast storage. The Ultrastar DC ME200 extends server DRAM using proprietary caching software in an attempt to fully exploit the capabilities of modern multi-core processors. In an age where in-memory computing is increasing in popularity, is this a smart move, or simply a counterpoint to the lack of storage-class memory in the WDC portfolio?
- The Fall of STEC and Rise of Western Digital
- HDD Capacity Threshold Reaches 15TB
- Cache or Tier – Does it Matter?
Ultrastar DC ME200
First of all, let’s start with what it is. The Ultrastar DC ME200 (ME200 from now on) is an “optimised” U.2 or AIC form-factor NAND drive. Internally, the technology is based on 15nm planar MLC NAND and comes in 1TiB, 2TiB or 4TiB capacities. Note the emphasis on binary terabytes here, to match system DRAM specifications. The drive interfaces with system memory using what a WDC blog post describes as “virtual memory” pools. WDC recommends an 8:1 ratio of real DRAM to ME200 capacity, claiming a standard 1U server can extend memory capacity to 24TiB. WDC supports current standard Linux distributions without changes to application code.
The specific process by which the WDC drive extends caching isn’t actually explained, although it is hinted at in the blog post. Conceptually (see the diagram shown), WDC is implying that the ME200 is effectively part of the memory pool, with multiple algorithms, tools and tricks used to predict and prefetch data from NAND into memory (and back again). Exactly how the caching process is implemented is important and we’ll come back to the subject again in a moment.
Look back to IBM mainframe days and you’ll see paging was heavily used as a way to extend virtual address spaces. This was done for cost savings and was possible because some processes (like batch jobs) can afford to timeshare with online users and OLTP systems. When I/O occurs, processes have to wait. Long waiting processes can be swapped out to virtual memory on disk, allowing other tasks to run. In this way, it was (and still is) possible to push mainframe systems to run at 100% or greater.
Linux (like most operating systems) already has memory management capabilities to extend the amount of virtual memory in a system, compared to the physical capacity available. With the advent of cheap servers and DRAM, it’s questionable as to whether disk-based swapping has any merits these days. This is even more relevant with such a high level of server virtualisation in use across enterprises. Persistent storage is so slow compared to DRAM speeds, that even a small amount of paging will induce significant application performance problems.
So, how can WDC implement a system that efficiently uses NAND flash to give near-DRAM performance? The question can be answered by understanding how an application accesses data in memory. Rarely is data access entirely random. If it was, then caching would have no benefit. Caching techniques rely on locality of access and uneven access distributions. In analytics, for example, processing could focus on small sets of files that are repeatedly accessed. This data sits in real DRAM, until the next set of files are read, but gets swapped to NAND as the focus of processing changes to another group of files.
Any performance hit will come from exchanging real and virtual DRAM pages, so the fewer times that needs to occur, the better. Logically then, this technology would work well with bare-metal deployed analytics-type applications.
Another option for WDC is to optimise the ME200 internally, to be more “DRAM-friendly”. This could mean having more DRAM on the internal drive controller and aligning memory page sizes with NAND page sizes. This would help to reduce write amplification. We can already see that the device is using MLC NAND for better performance and resiliency compared to TLC or QLC. If the drive can serve more data from internal DRAM, then performance would be improved overall. This leads to the next question – how is DRAM being extended?
The ME200 is an NVMe drive. NVM Express introduces significant performance improvements by increasing parallel processing, reducing the overhead of I/O command execution and of course, putting storage on the PCIe bus. So some of the benefits come from simply having a fast device on a fast internal bus. The real question though, is how does the operating system see this device? It could be that the ME200 is simply a swap device, although that could be achieved already today. WDC claims to be using algorithms and ML/AI to optimise caching, so this indicates some additional software.
This post on The Register implies the deployment of a hypervisor on the application server as the implementation process. This seems excessive as a way to get better performance and introduces questions of compatibility and support – as would any VMM replacement or plugin.
It’s Been Done Before
This isn’t the first time we’ve seen attempts to extend system DRAM to optimise costs. Ill-fated Diablo Technologies developed a solution called Memory1 that used byte-addressable NAND flash. Memory1 plugged into traditional DRAM sockets and extended the memory capacity, using the assumptions on access patterns we’ve already discussed. The Memory1 solution also relied on the overhead of QPI (QuickPath Interconnect) where memory access times are impacted in NUMA multi-socket servers. You can watch some additional background on the technology from Diablo Tech Field Day presentations (also embedded here).
- Tech Field Day 10 Preview: Diablo Technologies
- What are Storage Class Memory (SCM) and Persistent Memory (PM)?
- Has NVMe Killed off NVDIMM?
The Architect’s View
With continued improvements in multi-core processors, there will always be a balance between fully utilising processor cores, system memory and external storage. I can see a niche use for the ME200, but the “hypervisor” and additional software requirements need to be addressed. The proverbial elephant in the room here is WDC’s lack of an Optane competitor. This article from 2017 implies that WDC is looking to ReRAM as their SCM solution, however, I don’t think there are any scalable solutions available yet. Also, WDC was in the NVDIMM game previously when SanDisk resold Diablo devices.
In the interim then, WDC appears to be using some of their software assets from previous acquisitions to at least have a foot in this market. It’s going to be fun to see how this plays out. While the benefits of multi-tiered DRAM seem to be apparent, will the idea ever gain enough traction with end users – and more importantly, public cloud hyperscalers?
Comments are always welcome; please read our Comments Policy first. If you have any related links of interest, please feel free to add them as a comment for consideration.
Copyright (c) 2009-2019 – Post #F9DD – Chris M Evans, first published on http://blog.architecting.it, do not reproduce without permission.