Accelerating Workloads with NetApp Plexistor

I hadn’t realised that NetApp acquired Plexistor earlier this year. Depending on which site you look at, the acquisition was a pretty cheap $20m or $32m, based on an A-round investment of around $4.5m in September 2014. NetApp demonstrated application acceleration using the Plexistor technology at Insight in Berlin last week (see the day 2 keynote at around 33:50). Latency figures were quoted as low as 3μs, so how was this achieved? Let’s look at Plexistor in a little bit more detail first.

Plexistor

Plexistor was an Israeli startup. I first saw the company at Storage Field Day 9 in San Jose (March 2016). Let’s just say the presentation wasn’t the best we’ve seen, with lots of confusion around exactly what the Plexistor SDM offering did and how it worked. Plexistor came back in June the same year and presented again, with better results. You can find links to the presentations at the Tech Field Day website or at the end of this post. In simplest terms, Plexistor SDM uses storage-class memory (NVDIMM or NVMe devices like Optane) and a Linux kernel driver to present a local host file system called m1fs. As m1fs is using byte-addressable technology, local latency is extremely low, which is how NetApp was able to demonstrate 3μs in the onstage presentation.

Why Storage Class Memory?

SCM (storage-class memory) describes a byte-addressable memory that has persistence. As we all know, DRAM has no persistence, with the contents lost on power-cycle of the server. External storage such as hard drives & SSDs have persistence but are written to in blocks. If you want to update a single byte of a 4KB block, you have to read, modify and write that content. Caching in DRAM helps to eliminate some latency issues with external storage but introduces challenges of consistency. Write updates in a volatile cache like DRAM must be replicated elsewhere to protect against unexpected server failure.

Storage Class Memory offers a chance to resolve some of the consistency issues. SCM device contents are persistent across reboots, which removes the need to constantly copy updates to multiple nodes (with caveats). In addition, SCM operates at speeds approaching DRAM, therefore removing the need to implement an additional layer of caching.

Application Considerations

So with SCM and Plexistor, we have a fast local file system. We’re back to the old days of DAS, but with much lower latency and much higher performance. When considering applications that work well with DAS, we see solutions like MongoDB that use application consistency, rather than shared storage. This is the perfect scenario for Plexistor SCM and it’s not a surprise that the NetApp on-stage presentation was using a MongoDB database.

What isn’t yet clear to me is the management of back-end persistence. The NetApp onstage demo showed Plexistor fronting a FAS ONTAP system. The application and Plexistor run on a server, with connectivity to a FAS system. The assumption here is that the FAS provides the shared storage consistency, while the Linux server with Plexistor provides the acceleration.

In a heavy read I/O environment, this would be fine, but how are writes synchronised back to the ONTAP system? We would now have two levels of eventual consistency; one between Plexistor and ONTAP and one between MongoDB nodes. Of course, I’m assuming that writes are cached in SCM and asynchronously written back to ONTAP, but this may not be the case. It could be that only reads are cached. This is still good, but not going to deliver high performance for all applications.

The Architect’s View

There’s been a lot of talk about storage class memory and in particular, technologies such as Optane. Collectively the industry expects programming changes to applications to take advantage of performance improvements. Plexistor SCM is exactly this. NetApp got a bargain for a few $10’s of millions and has potentially introduced a reason to use traditional shared storage with cloud-native applications. I’d like to see more examples of how SDM is being implemented and if there are any shortcoming of what the technology can be used for. MongoDB may be a good one for an audience to understand, but I expect analytics will provide a much better use case.

If you want to read more about Plexistor before acquisition, the corporate blog is still available behind the front page firewall of the Plexistor site. This also allows access to other (now hidden) pages.

Plexistor

Why Storage Class Memory?

Application Considerations

The Architect’s View

Related Links