Stellus delivers scale-out storage with NVMe & Key-Value technology

Stellus delivers scale-out storage with NVMe & KV tech

Chris EvansEnterprise, Storage Hardware, Storage Media, Tech Field Day

As we move into the era of machine-generated content, storage systems are increasingly challenged to support larger volumes and sizes of unstructured files, with highly parallel access profiles and significant throughput demands.  As we highlighted just this week on Storage Unpacked, this demand is driving the need for new storage platforms that can take advantage of new media technology.  Stellus Technologies emerged from stealth this week and released a new storage platform that hopes to address Big Data challenges, through the use of NVMe and key-value store technology.

Background

Much of the data created today comes from machine-generated sources.  Increasingly, we see the creation of more complex data sets from industries like Life Sciences, Media & Entertainment and Industrial IoT.  The access profiles for these data has become more parallel in nature.  Medical equipment, transportation and creative industries produce parallel streams of data that are in turn, accessed in parallel through analytics or post-production work. 

Modern storage systems are required to manage data across a single logical namespace, cater for small and large files alike, with high degrees of parallel access and throughput. 

New Media

NVMe SSD and SCM drives deliver high capacity (up to 32TB today) with a high degree of parallelism through the NVMe protocol.  New features such as Zoned Name Spaces (ZNS) will enable the capability to logically divide a single high capacity drive into sections that are addressed independently.  SCM or Storage-Class Media provides a persistent layer, either addressable as memory or through block-based NVMe protocols. 

System Architecture

Massively parallel storage architectures are both the result of new technology and the demand from modern data-generating applications.  However, spreading data widely across the available media has always been a part of storage system design.  We only have to look at RAID groups, wide striping, erasure coding and HCI as examples of this technique.

With modern flash NAND and NVMe, the efficient distribution of back-end I/O across all available media is more critical than ever.  Intel’s DC P4500 8TB NVMe drive, for example, is capable of 640,000 random read IOPS (65,000 write) and up to 3.2GB/s of read throughput at 82µs (1.9GB/s at 30µs for writes).  With a box-full of these devices, inefficient storage software leaves unused performance on the table – at a considerable cost.

Indirection

The challenge in storage system design is taking the logical view of storage from a host perspective and transforming that data through layers of indirection to something that can be stored efficiently on SSD.  After all, SSDs, HDDs and block-based SCM are simply collections of data blocks, organised in a logical order, from offset zero upwards. 

Storage platform designers must layer metadata and other structures on top of this raw storage.  This means creating processes that distribute and protect data from media failure, cater for de-duplication and compression, deliver efficient garbage collection and allow for dynamic scaling.

This challenge is no mean feat when we remember that NAND flash has to be managed for endurance and to level out any performance spikes during garbage collection.  Most, if not all, of the all-flash storage systems in the market today will have features to mitigate these issues. 

Square Peg

Part of the problem in managing solid-state media is to align storage objects (metadata and file fragments) to block storage structures.  De-duplication, for example, creates variable-sized pieces of data that don’t necessarily align with 4KB blocks on media.  So how do we address this ongoing dichotomy and make efficient use of persistent media while creating efficient metadata and data structures for data storage?

Key-Value

One solution is to use a key-value store as the underlying method for reading and writing object fragments and metadata.  An object store is simply a form of database that efficiently stores and retrieves data through the use of a key/value pair.  The key is the part that describes the data; the value is the actual content.  An example could be a formatted piece of information, such as a date, a bank account number, a name or part of an address.  We discuss key/value stores on this recent Storage Unpacked podcast looking at document databases.

[Note: a quick call-out for the mainframe here, key-value stores can be traced back to VSAM and KSDS (key-sequenced datasets) from the 1970s, and to ISAM from even earlier]

A key-value (or KV) store is useful because it allows us to store arbitrary pieces of data of any length and format.  The KV platform takes care of managing the actual storage of data, providing a simple interface to the user.

KV & Storage

How does that help us with storage?  Rather than writing data to physical media, one solution is to add a layer of indirection and store and retrieve all data as KV pairs.  Now, both data and metadata can be stored in a structure that exists as a vast collection of entries in a key-value database.  The KV store manages the processes of variable-length records, space reclamation and physical media placement.

Stellus Data Platform

Stellus Technologies has designed a storage system that uses a KV store for the storage of data on persistent media.  The first release of the Stellus Data Platform is based on a scale-out architecture of multiple Data Manager nodes and Key-value Store nodes, connected through a shared network fabric. 

Each KV Store node houses multiple NVMe SSDs, exposing many logical KV stores across a single cluster of servers.  Host data access is delivered through Data Manager nodes.  To increase front-end performance, add more Data Manager nodes; to increase capacity, add more KV Store nodes.

The specifics on how data is split into objects and stored across a Stellus cluster is relatively complex.  Essentially, each KV node provides read/write access to KV data, managing the lifetime of keys, depending on their requirements. 

KV SSDs

Why bother building a solution with another layer of indirection through a KV store?  From an efficiency perspective, the KV nodes take care of any specific tasks that need to be performed in managing physical media.  This abstraction allows any type of media to be dropped into the solution in the future.  The KV store just needs to know how to use it efficiently. 

As we start to see the emergence of much larger SSDs (32TB and greater), technologies like Zoned Namespaces (ZNS) allow large physical drives to be accessed as many logical drives.  The standard, as developed by NVMe Express, results in a longer lifetime for media and increased throughput.  The KV implementation, as used by Stellus, can take advantage of ZNS by creating many logical KV stores as separate partitions across flash media. 

However, solutions like Samsung’s KV SSDs could provide the capability to simplify much of the operation of the Stellus KV Store nodes, by pushing the key-value code itself down onto the drive.  Now, the drive provides an interface that directly exposes KV functions across an interface like Ethernet.  In fact, Stellus has developed a protocol called KV over Fabrics that implements this process today; albeit between KV Store nodes and Data Manager nodes.

Computational Storage

Another solution is to use computational storage drives.  We’ve discussed this kind of technology before.  NGD Systems, for example, enables code to be pushed to individual SSDs that then performs processing on the stored data.  I’m sure it’s possible to configure computational storage SSDs to act as individual KV platforms. 

It’s possible that SmartNICs could also be used to deliver this functionality.  Now, we just need to build storage arrays from racks of NVMe SSDs and SmartNICs in enclosures. 

Offload

What is the long-term gain from moving processing down into the media in this way?  First, there’s the obvious benefit of offloading some processing to CPUs on the drives themselves.  NGD Systems’ Newport drives consume negligible additional power, so there are efficiency gains to be made. 

Some more advanced functionality could also be offloaded, such as encryption and de-duplication.  Drives could autonomously self-repair when individual media fails, by re-creating redundant copies of data (today, the Stellus Data Platform uses erasure coding protection). 

All of these enhancements aim to reduce latency and improve throughput, essentially to exploit the full capabilities of individual solid-state and SCM drives.

The Architect’s View

There are other solutions on the market today (such as VAST Data and Weka) that are also looking to make the best use of new media.  The high levels of scale and performance offered by these platforms aren’t for every company, but increasingly we’re going to see traditional solutions fall behind as the use cases highlighted here become more prevalent.  Scale-out, high-performance object and file stores are a fast-developing new market segment that will only increase in popularity in the coming decade. 


Post #42e1. Copyright (c) 2007-2020 Brookend Ltd. No reproduction in whole or part without permission.