File vs Object Storage - do we need to make a binary choice?

Object Storage has been a long time developing but is finally making inroads into the application development process. There are predictions being made that object storage will subsume file storage, but in reality, these are two use cases, both of which should be treated on their merits and used accordingly.

Background

Object storage has existed in the public cloud and the data centre for over two decades. The origins can be traced back to work done in the late 1990s, including start-ups like Filepool, which was acquired by EMC and became the Centera platform. The object model uses REST-based APIs and web protocols to transfer data across the network, either locally or across a wide area that includes the Internet.

Historically, object storage has been used for large-scale data stores, including archives and secondary backup data. This approach has changed over recent years, as fast object stores and smaller footprints have been developed.

Protocol

One of the first considerations when comparing object and file storage is the protocol. Object storage operates around a set of primitives – CRUD – create, read, update and delete – which apply to objects under management. Object stores write (create) objects as a single entity in one atomic operation and generally don’t update in place. Once an object is created, any updates are then a combination of delete and create, although some platforms do allow in-place modifications. Similarly, read operations generally read an entire object, although protocols like S3 do provide some degree of range-based retrieval.

Now, we should point out that the CRUD model is a simplification of the commands available for managing objects. Most object storage solutions today follow the AWS S3 API, which offers a vast range of commands and sub-command options, many of which have been developed to support platform features like Glacier.

NAS

Network Attached Storage solutions implement protocols like NFS and SMB, enabling remote clients to access files in a file system stored on a central server. Although not specifically POSIX-compliant, protocols like NFS introduce the ability to update data in place, create and extend files, place files in a hierarchy (directory/folder structure) and implement file locking for data integrity.

Implementation

The evolution of NAS and object storage has resulted in some protocol-specific implementation details. The first crop of object storage vendors arguably created scale-out multi-node systems to gain resiliency and scalability. File servers, in contrast, can be implemented on a single Linux, Unix or Windows machine with no special storage implementation.

However, as we discuss the use cases for both NAS and object, it is worth highlighting that neither protocols require a specific architecture. Both NAS and object protocols could be scale-out or scale-up architectures – it really doesn’t matter.

Endpoint

This last point is probably the most important when considering how to use modern file and object storage. The public cloud has obfuscated the implementation details for storage platforms to the extent that the end-user only needs to know details of the endpoint to consume the service. The functionality offered by the endpoint becomes the important factor, not the hardware on which it sits.

Choices

So, how should we pick the right protocol? If strict consistency is important to your application, then file may be the best route. If your application needs parallel read access to millions of files as part of an analytics process, then object will probably be the right choice. If you’re building a traditional database application, then even block storage (god forbid) could be the right answer.

The Architect’s View™

As we discussed in this post six years ago, there are no protocol wars, instead it’s a question of “horses for courses”. In the future, if the majority of data doesn’t need the features of file storage, then object storage is the logical choice, as the protocol is arguably much more efficient than NAS. In this scenario, then perhaps object storage does become the dominant API for our data, simply by the volume of information stored.