Data Mobility - Global/Scale-out Data Platforms

This is one of a series of posts looking at data mobility for the hybrid cloud. You can find a link to all of the published articles as they become available here. (link).

Probably the most elegant solution to data mobility is to seemingly make data appear in many places at the same time. Imagine you need to move an application between locations or in/out of the public cloud. If the data is simply there, all of the data mobility issues go away. Now, an application can be spun up wherever it is needed, making more efficient use of compute resources or cloud-specific features. This model of working is the vision of two types of solutions – global NAS and distributed storage.

Global NAS

Global NAS could also be thought of as a distributed file system. There are many solutions already on the market today that effectively expose data to any geographic location. The back-end “secret sauce” focuses on keeping the data consistent and ensuring good performance. In order to maintain data consistency, these solutions need a global namespace. The namespace provides a directory and file hierarchy independent from the physical storage of the data. It also introduces a consistent (and hopefully single) security model.

The global namespace allows data to both persist the lifetime of any application instance (e.g. a container or VM) and refer to the application data using an abstracted naming system that is more business focused. File systems also provide locking and data integrity features to prevent multiple concurrent updates. The obvious restriction to having data everywhere all the time is physics. The speed of light introduces latency and increases access times. Solutions such as Panzura and CTERA seek to mitigate the latency issue by clever use of data caching and placement. These solutions also introduce distributed lock management.

Distributed Storage

Where NAS solutions have structure in a file system, other distributed storage solutions simply make data available wherever it is needed. These include object stores and block/volume-based storage. Hedvig Distributed Storage Platform, for example, exposes block devices across multiple locations, but doesn’t add any data consistency. If you need it, you manage it yourself. This is a bit like running a cluster without clustering software. However, it’s not as bad as it seems. In reality, if we’re dealing with movable applications, a lot of data might never be accessed in multiple places at the same time. So providing the ability to get to data is all that’s needed.

Object store solutions make data available via HTTP/REST protocols. Latency can be reduced using eventual consistency replication, or solutions like Zenko that act as an object store “redirector”, separating the metadata from the physical storage layer.

Challenges

The most obvious challenges to either of these solution types are keeping data consistent. When data is generally read-only, this isn’t too hard; the overhead is simply in keeping multiple copies in many locations. Write traffic is harder to deal with. Every place where data is exposed needs to know if that data changes. If changes are made, either the updates need to be distributed to each location to maintain consistency, or each location needs to know where to get the latest copy from. If the latest copy isn’t local, we introduce latency. If a consensus of locations doesn’t know about the updates, we introduce inconsistency and risk data loss in a failure scenario.

Avoiding consensus and consistency issues is what distributed file systems have to deal with. There are lots of well-known solutions for consensus, such as Raft and Paxos (used by Ceph). Elastifile decided that the existing offerings weren’t good enough and developed their own file-system focused solution called Bizur, which I believe translates from Hebrew as disaggregation.

The Architect’s View®

There are lots of other solutions on the market, like Nasuni, which uses cloud storage or Datera (block storage) and NooBaa (object). Red Hat has been acquisitively strong in this area, buying up Ceph and Gluster. Not all solutions are capable of geographic dispersal of data. Qumulo, for example, allows replication between groups of clusters, which provides a part-way solution to full dispersal. We may see this evolve over time to add that full geographic capability. WekaIO is very much a high-performance rather than a distributed solution, but the elements are there to implement geographic dispersal.

The question to ask when looking at these options, is what exactly is needed by the business/application? A lot of data could be pushed to an object store if it is mainly read-only and performance isn’t an issue. Containers and virtual instances (and encapsulated applications like databases) can all be run from file storage – if the performance is there. Distributed block storage looks the least valuable, going forward.

Inevitably, many organisations might end up with a mix of these solutions. What’s not 100% settled is how many will integrate into the public cloud. Some solutions are already cloud-native (Nasuni) or can back their data off to public cloud (CTERA/Panzura). Many (NooBaa, Qumulo, Weka) run as cloud instances and can be run solely in public cloud or in a multi/hybrid model. However, none are “cloud native” and so don’t directly expose APIs that can be consumed by other cloud applications. This is an untapped area, in my opinion. Public cloud providers could continue to offer their own services, but may need to compromise and either integrate with vendor solutions or run them natively.

Some of the companies mentioned in this post have presented at Tech Field Day events. I’ve included some links that can provide good additional background information and viewing.

Data Mobility – Global/Scale-out Data Platforms

Global NAS

Distributed Storage

Challenges

The Architect’s View®

Further Reading