Moving to Unstructured Data Stores

The storage industry continues to see development and evolution, with new solutions appearing on the market each year. One trend I’ve noticed is the move away from the use of the term “object store” and towards platforms being described as “unstructured data stores”. Is there any merit in this rebranding and could the “legacy” object store name now have negative connotations?

Unstructured Data Management

Although they manage similar content, object stores and file servers have grown from different roots. File services developed from the desire to share files across a network. NFS came from work done at Sun Microsystems in the mid-1980s and SMB/CIFS from Microsoft. Arguably, both NFS and SMB focus on the sharing aspect of data, rather than the need for scale-out capabilities. Adding scalability (including geo-scalability) has proved to be one of the significant challenges for file storage platforms.

Object Stores

Object storage can trace origins back to the mid-1990s and the work done at FilePool and several university projects. EMC acquired FilePool and sold the IP as Centera. Since then, we have seen many object storage platforms emerge, including Cloudian, Scality, CleverSafe (now within IBM), OpenIO and other solutions acquired by major vendors.

Object stores operate differently to file servers in the way data is stored and accessed by clients. NFS and SMB offer features such as file locking, data hierarchy and the ability to perform partial updates in place. Object stores focus more on immutability, replacing entire objects on update, high parallelism and geo-dispersed access. Of course, none of these features is by any means exclusive to one platform or another.

Crossover

In recent years we’ve seen a degree of crossover between solutions. For example, Cloudian added file support with the acquisition of Infinity Storage. Scality RING offers file access through SOFS, Caringo provided file capabilities with SwarmNFS.

At the same time, we’ve seen a change in requirements for unstructured storage with the rise in demand for high-performance platforms that can meet the needs of AI and machine learning.

Perception

After 20 years of development, does the term object storage have a particular image or reputation? I’d suggest some views might include:

Object storage is only suitable for archive or long-term cold storage.
Object storage isn’t cost-effective unless you have petabytes of data to store.
Object storage is complex to deploy and manage.
Object storage is complex to program.

How many of these statements are true today? Object stores are increasingly using flash storage and can be deployed in all-flash configurations. Object stores can be deployed for 100TB of data and upwards. This capability can be at even lower capacities when using solutions like MinIO. Object storage is not hard to implement (again MinIO), and many programming languages have S3API libraries – the standard protocol that all platforms now support.

Legacy

Despite the perception and reality, object storage does still have an association with low-cost and low-performance. This legacy is why I believe that the latest entrants into the storage market are dropping the “object” terminology and instead focusing on merely describing their platforms as managing “unstructured data”.

Unstructured Data Management

There are some excellent reasons for this rebranding exercise.

Files are just objects. Files are objects, but with additional front-end protocol benefits. NFS/SMB offer security around file locking and data integrity. File servers assume objects will be edited and make it easy to modify in place. However, inherently there’s no difference between a file and an object, only the way it is accessed.
Unstructured data, especially when viewed in light of analytics, is seen to have much more value than an inactive archive. Businesses are prepared to spend more money to store data with apparent value.
Unstructured data is more likely to have higher performance needs when viewed in light of ML/AI. In the storage industry, businesses pay more for performance than they do for capacity.
Using the term “unstructured” removes any indication of access protocols, specifically those that require programming, such as S3 (or other proprietary solutions). Object stores have historically been seen as difficult to access (the Centera platform being a perfect example).

Generally, the term “unstructured” is more aligned to the notions of Big Data and content mining than “object store”, which feels more technical in nature.

Merger of Equals

Many of the new solutions in the market today are built on architectures that work with either object or file data. FlashBlade from Pure Storage supported both NFS & S3 from day one. Stellus Technologies is building a platform on KV technology – essentially storing billions of individual objects that are metadata or fragments of client data. VAST Data is using a similar concept and spreading data across hundreds or thousands of NVMe drives.

But are these platforms offering a consolidated object and file solution? Neither Pure, VAST nor Stellus are designed for geo-location, for example. That’s not to say this feature couldn’t be integrated into the solutions.

The Architect’s View

The most critical aspect of data management for all businesses is how we store and protect our content, not the specifics of the platform on which it runs. Increasingly, it makes sense to harmonise file and object storage into a single platform, as many use-cases need multi-protocol access to the same data. The underlying process of storing files and objects can be aligned. The access method is just a protocol choice.

The process of separating data storage from the access protocol provides the ability to use any storage medium – HDDs, SSDs, KV-SSDs or the public cloud. Most of the object storage solutions on the market today could move to that model. Perhaps all the object storage vendors should rebrand as “unstructured data managers” as we put more of an emphasis on the content we’re storing.