Delivering File Protocols on Object Stores

When object stores first started to appear on the market, the aim was to target a very specific niche. Unstructured data growth demanded better ways of storing data than file servers. Object stores offered the ability to scale to billions of objects, without the issues of managing a file system. However, rewriting applications for REST-based APIs is time-consuming. Wouldn’t it be nice to have the benefits of object scalability, with the functionality of file? This is what many vendors now offer, as we see more file and object integration and file protocols on object stores.

Object Scalability

As a physical storage medium, object stores have some great benefits. They are highly scalable, reaching petabytes of capacity and billions of objects. With erasure coding, object stores become more reliable as capacity increases. Data can be dispersed and made accessible across multiple geographies without having to create full replicas that need to be kept in sync. However, the flexibility of object stores comes at a cost. An object store offers basic file read/write operations. Typically these are CRD (Create, Read, Delete), but sometimes CRUD (Create, Read, Update, Delete) where Update can simply be a combination of the other functions.

Object stores don’t implement features like file locking, the ability to update parts of an object, caching or complex security models. In effect, the features many users and developers have come to expect from POSIX-compliant file systems.

POSIX-compliant file system. A file system that meets the internationally agreed POSIX standards for file and directory operations. In effect, POSIX-compliant systems will operate and respond in a known and expected way.

The Benefits of File

Using a file system for storing data has been wildly successful, with companies like NetApp founding their business on this part of the market. Files have more structure than object stores or block devices, implementing directory hierarchies, complex security models (including LDAP and AD based) and data integrity. A file server adds the intelligence that block devices never had. As a side comment, it’s interesting to note that file was the primary protocol for data on the mainframe, although the physical storage was divided into volumes/LUNs. Looking at the two protocols, it seems obvious that combining the physical storage benefits of object with the logical accessibility benefits of file would result in a great combination. Naturally that’s what vendors have been doing.

Implementations

There are two main ways that vendors have been merging object and file:

File on object – file services are provided on a platform that uses object storage at the back-end. Data isn’t accessible by both protocols and in some cases, only through file.
File with object – file services allow data to be read/written interchangeably between file and object. Users can (for example) write with file and read with object.

Both solutions exploit the benefit of running as an object store, including features that can’t easily be implemented in a traditional file platform. For example, data can be distributed geographically in an object store without having to entirely replicate the data. Placing a file system on top allows the file system to be made accessible in multiple locations much more cost-effectively and with reduced complexity.

File On Object

Running a file server on an object store provides the back-end scaling features of an object store, with the normal usability features of file. In many cases the end user doesn’t know or need to care that the physical storage medium is object. OneBlox from Exablox is one example of this kind of solution. A OneBlox implementation consists of a cluster or ring of nodes that together operate as a large object and key-value store. Data is split, de-duplicated and compressed as it is stored across the ring.

The open source Ceph file system platform is another example where POSIX-compliant file systems are created and stored on top of an object store (RADOS). The RADOS component focuses on managing the physical resources, while the file system layer manages logical access, security and data integrity.

With solutions like OneBlox and Ceph, data on the object store is in a proprietary format, so can’t be accessed directly as objects. Cloudian recently announced HyperFile, an appliance that integrates with HyperStore to provide file services and uses HyperStore as the secure repository. CTERA partners with companies like IBM Cleversafe and DDN to offer global file storage, backed by either public or on-premises object stores.

File With Object

Accessing file with object is the scenario where data can be stored and retrieved through either object or file protocols interchangeably. Data could, for example, be written by a traditional application with NFS or SMB and then analysed using the object store interface. Why is this useful? Well, it means applications that already store data on a file system don’t have to be rewritten. Object interfaces can be used to perform tasks that would otherwise slow down a traditional file system, such as high-performance scanning of data. Processing data for analytics purposes is usually a read-only process, so all of the file locking and security issues are simplified or not relevant.

Vendor File With Object Solutions

Support for File With Object is now quite widespread in the industry. SwiftStack recently announced support for SwiftStack 6, which includes bi-directional file/object support. OpenIO provides a FUSE connector that maps files to objects. Data can be read/written to the file system, with object support restricted to read-only for data integrity purposes. Scality RING provides the capability to access data stored on SOFS (Scale-out File System Connector) using SMB, NFS or FUSE protocols with the S3 API. Hitachi Vantara’s HCP platform has had the ability to read content stored with object protocols through standard file APIs, including NFSv3, SMB 3 and WebDAV. Caringo provides multi-protocol access to data in Swarm through SwarmNFS, implemented as a lightweight stateless Linux process.

The Architect’s View

Blurring the lines between object and file provides some scalability and efficiency benefits to the enterprise. Where POSIX compliance isn’t required, data can be accessed quickly and efficiently as objects, making analytics easier to integrate, especially with public cloud. Imagine today’s network of CCTV devices that write video to a file share, but could be analysed in the Public Cloud as object content. There are some issues to consider, such as how file locking and data integrity is managed on a global basis. These are non-trivial issues to solve (but have been resolved already).

Again, we’re seeing the abstraction of the underlying storage media and a greater focus on the data and what can be done with it.