Solidigm solves the DRAM SSD issue with new D7-P5810 SLC SSD and CSAL software

Solidigm has announced a new SLC-based SSD, the D7-P5810, with extreme endurance and write performance. In conjunction with CSAL, is this combination of hardware and software the answer to issues of DRAM sprawl in high-capacity SSDs?

Background

Solidigm has announced a new solid-state disk based on SLC technology. The PCIe Gen 4.0 device offers a capacity of 800GB in a U.2 15mm form factor, with a 1.6TB device promised for the first half of 2024. The D7-P5810 can deliver 865,000 4K random read IOPS (queue depth 256) and 495,000 4K random write IOPS (queue depth 256) at latencies of 53µs and 15µs, respectively. Latency for sequential read and write is quoted as 10µs and 13µs, respectively. With 6400MB/s sequential read bandwidth and 4000MB/s sequential write bandwidth, this is a capable device for the enterprise market.

Endurance

These are impressive numbers, but possibly more important is the endurance rating of 50 DWPD or 73 PBW (see this post for an explanation of those terms). At this level, the entire 800GB of a single drive can be overwritten 50 times a day or approximately once every 30 minutes. This makes it ideal as a cache or for write-intensive applications.

Compare the D7-P5810 to the D5-P5336 we covered in July 2023. This SSD is a high-capacity device, with models up to 61.44TB, but endurance levels are only one-tenth of the D7-P5810, at between 0.42-0.58.

QLC vs SLC

This, of course, is the issue with QLC media compared to SLC, MLC and TLC (see this post for an explanation of QLC compared to previous NAND versions). Endurance and I/O performance is much lower than SLC, but density is much higher, and that equates to a lower cost per TB.

But there’s another issue with large-capacity SSDs, and that’s the DRAM used per drive, which has cost and power implications. The typical ratio of DRAM to NAND in SSDs has generally been around 1GB per 1TB of capacity. In a 61TB SSD, that would mean deploying 61GB of DRAM per drive. Imagine a system built from 24 D5-P5336 SSDs – the DRAM in the drives could be more than in the server itself.

Write Amplification

As we noted in the blog post on the D5-P5336, the indirection unit (IU) of that device has increased from 4K to 16K. That means writing blocks four times the size of a typical SSD but with the need to deploy less DRAM. Increasing the IU reduces the volume of metadata being stored and also the DRAM requirement. However, this technique also increases the write amplification. Where, for example, a host aligned to the 4K block size would see a 1:1 ratio of host to device writes (roughly speaking), with the D5-P5336, each write only partially writes an IU, creating the write amplification. The most ideal scenario would be writing larger blocks, for example, sequential I/O.

Host FTL

How will vendors solve the write amplification issue for increasingly larger SSDs? Solidigm is proposing using CSAL, an open-source software solution that creates a host flash translation layer. CSAL is part of the SPDK (Storage Performance Development Kit), developed and first released by Intel in 2014.

CSAL introduces a virtual device driver, exposing aggregate devices to a host server while managing the placement of data on either SLC or QLC media. The concept is similar to the Synergy storage driver we discussed in this post on the Solidigm P41 from August 2022.

CSAL mitigates the issues of write amplification by using an SLC buffer like the D7-P5810 to absorb write I/O before eventually placing data onto the QLC layer. When data does eventually reach QLC, the write I/O size can be more sympathetic to the IU of the QLC device, in turn reducing the write amplification issues on that media.

StorONE & VAST

The concepts used in CSAL are not new. Caching has been around forever, while cascaded tiering existed in products like Compellent Technologies from the early 2000s. VAST Data and StorONE are two current storage companies using a similar approach to CSAL, where I/O is accumulated in a fast storage tier (either SLC or persistent memory) and then structured to write efficiently to lower endurance media.

You can read about StorONE in this post from 2020 that discusses the AFAn.

You can read about VAST Data in this post from 2019 that introduces the technology.

Retrofit

So, if you’re building storage into your server, then CSAL and a combination of SLC/QLC could be for you. How will the use of large-capacity SSD media affect storage array vendors?

Pure Storage already uses a custom SSD solution called DirectFlash. We’ve been discussing this technology for many years and have included some blog posts and podcast references here. As Pure Storage manages the FTL, its systems can optimise the amount of DRAM in use, mitigating the impact of DRAM sprawl.

IBM uses FlashCore Modules (FCMs), which could potentially work together to mitigate the DRAM issue, although we’ve not seen any evidence of IBM taking this route.

For the remaining enterprise storage vendors, will their product architectures easily accommodate the CSAL tiering model, or will they have other techniques to solve the problem?

The Architect’s View®

It’s clear that large-capacity SSDs represent a conundrum for enterprise storage vendors. While some may claim the demand for 32TB+ drives is not there yet, it’s clear that eventually, these drive capacities will be adopted, and the SSD vendor mitigation route for the DRAM issue is to tier.

CSAL, or whatever vendors choose to implement, needs to support redundancy and an ability to size the SLC and QLC tiers independently. We haven’t seen any vendor roadmap to suggest the IU size is being addressed, so we can’t say where the industry stands at this point.

Two alternatives could appear. First, vendors could provide DRAM-less drives and let the host do the FTL management. These types of SSDs already exist. Second, the large-capacity SSDs could do more to internally manage an SLC cache, implementing CSAL within a single SSD. We haven’t seen this being discussed (yet).

We will be watching with interest to see how storage array vendors choose to support large-capacity drives. This feature could be a differentiator for vendors in the next evolution of all-flash storage.