Modern Storage Architectures: Datrium

This is one of a series of posts that will cover new storage architectures for the enterprise data centre and form part of an Architecting IT white paper later this year.

Storage platforms for primary workloads have generally fallen into a few standard categories, such as dual-controller or scale-out. With spinning media, the overhead of managing all I/O through a centralised funnel wasn’t a particular problem. Centralisation allows data services to be implemented as I/O requests are handled by the infrastructure. As we moved to flash, the issues of centralisation become more apparent. Flash devices, like SSDs have way more bandwidth than hard drives and can easily cope with random workloads. Dual-controller type designs are at risk of not exploiting the full capability of SSDs and flash.

Hyper-converged infrastructure (HCI) has been one solution to removing the central controller bottleneck. Rather than having a storage area network and shared storage array, HCI distributes computing and storage across many nodes. Data is more local to the application and in theory these solutions should scale. However, even hyper-converged solutions have issues. Data protection is implemented through distribution of data (and I/O) across many nodes. This means every node is involved in I/O traffic going “east-west”, which has a potential limit on scalability. Redundancy is at the disk and node level. Lose a node (or even take one down for maintenance) and redundancy has to be recreated. This can represent a risk or performance overhead. It also means deploying additional server nodes, just for data resiliency.

Datrium DVX

Datrium is a start-up developing another solution to the issue of running scalable applications with shared storage. The founders of the company have a history of working with data platforms, such as Data Domain, 3PAR, NetApp and VMware. This heritage is clear in the company’s main product offering, DVX. DVX is a platform that implements what Datrium calls Open Convergence. In the DVX model, persistent data is stored on a shared storage array, however active data is also cached at the host server layer. This means I/O can be served locally within the host (reducing latency) and also removing the need to replicate data between host nodes as happens in the traditional HCI model.

Compared to either SAN or HCI designs, DVX has some benefits and disadvantages. Like SAN, data is centrally stored, so can be de-duplicated and compressed centrally. If one host server dies, applications can quickly be restarted on the remaining servers without recreating any missing data. However, the cache on the new target server would need to be warmed up. Performance for reads occur at the speed of the cache in the host (which could be very low latency, with technology like Optane or DRAM), but writes are written through the the cache to the shared storage, making them more like a traditional SAN. All writes are initially stored in NVRAM on the data/storage nodes, so additional write capacity can be achieved by adding more data nodes to a cluster.

Datrium quotes some significant performance figures for their platform. For instance, in a test with Dell and IOMark last year, a single system supported 8000 IOMark VMs at around 18 million IOPS and 200GB/s of bandwidth. Naturally the disaggregation of performance and capacity allows host servers to scale with more flash and achieve these results, something that would be hard in a centralised system.

Stateless Servers

One big advantage of the DVX architecture is the ability to use any server technology. Customers can re-use existing (brownfield) servers, blades or deploy new infrastructure. Servers are stateless, as the cached data is only used for reads. Cores aren’t shared but are dedicated to applications in host (compute) nodes or for storage. However there is a small overhead in managing the local cache of each host. Currently Datrium has to supply the storage (data) nodes. System specifications are available on the Datrium website (here). User compute nodes need a minimum of 2 cores dedicated to DVX, with DRAM based on the amount of local SSD (7.5GB, plus 2.5GB per TB). Cache can be any flash between 800GB and 16TB of capacity.

One question I have here is exactly how the DVX software is implemented on each host node. DVX supports a range of hypervisors and bare metal deployments. The DVX software could be implemented (for example) on vSphere as a VM or as an ESXi plugin. Similarly the software could be a driver on RHEL. Its not really clear. The storage layer itself presents as an NFS share, but it appears that the specifics of the implementation aren’t being disclosed at the moment.

The Architect’s View

The split architecture is an evolution of both HCI and SAN that picks the best of both designs, but more importantly, implements a solution that is positioned to benefit from the low latency and accelerated performance NVMe flash promises to bring. We’ve seen this kind of solution already in the technology implemented by E8 Storage and Excelero. What’s not initially clear are the potential benefits of having active data already in the host. VMs can be quickly snapped or replicated to public cloud through Cloud DVX. Because storage isn’t delivered within the hypervisor, I/O overhead can be lower than some HCI solutions and there’s the option to mix hypervisors across a large estate.

This post is meant as an introduction to Datrium DVX and we’ll hopefully dig deeper over time. As with any new design, it’s worth having a high-level starting point that goes deeper and wider in subsequent posts. Look out for further updates and a more comprehensive summary of the technology in the Architecting IT White Paper on Storage Architectures due later this year.

Datrium DVX

Stateless Servers

The Architect’s View

Further Reading