Object Storage: Standardising on the S3 API

The public cloud and Amazon Web Services in particular have seen massive growth over the last few years. In April 2015, Amazon broke out the revenue figures of AWS for the first time, showing that the subsidiary was a $7.3 billion business with over 1 million active customers, accounting for 8% of Amazon’s total revenue. At the heart of AWS is S3, the Simple Storage Service, an online object store that is now ten years old and stores trillions of objects (latest figures published in 2013 showed 2 trillion objects, with the amount stored doubling each year).

S3 has been remarkably successful and is the foundation for many well known services such as Dropbox and Pintrest. Part of the reason for this success has been the flexibility of object stores compared to standard block and file protocols. From a user perspective, these protocols give little in the way of controlling how the data is stored and managed (they only support basic I/O commands like read, write, open and close).

S3 on the other hand is all about object level management and manipulation, with S3 you can describe how you want to store objects, encrypt them, present them (even as a website) and much more. Each object is validated during I/O operations unlike a file system (NFS/SMB) which does data integrity checking only at the entire file system level.

In addition to the management capabilities is the relative ease in which data can be stored in the system. The underlying storage infrastructure isn’t exposed to the customer. Instead access is provided through a set of programming interfaces, commonly called the S3 API. It’s through a combination of features, simplicity and ubiquity of this API that S3 has been so successful.

S3 Described

The S3 API is an application programming interface that provides the capability to store, retrieve, list and delete objects (or binary files) in S3. When first released in 2006, the S3 API supported REST, SOAP and BitTorrent protocols as well as development through an SDK for common programming languages such as Java .NET, PHP and Ruby. Storing and retrieving data is remarkably simple; objects are grouped into logical containers called buckets and accessed through a flat hierarchy that simply references the object name, bucket name and the AWS region holding the data. When using the REST protocol, these pieces combine into a URL that provides a unique reference for the object. Actions on the object are executed with simple PUT and GET commands that encapsulate the data and response into the HTTP header and body.

S3 features are reflected in the API and have matured over time to include:

Metadata – this includes system metadata and additional information created by the user when the object is stored.
Multi-tenancy – S3 is divided into many customers, each of which sees an isolated, secure view of their data.
Security & Policy – access is controlled at the account, bucket and object level.
Lifecycle Management – objects can be both versioned and managed across multiple tiers of storage over the object lifetime.
Atomic Updates – objects are uploaded, updated or copied in a single transaction/instruction.
Search – accounts and buckets can be searched with object-level granularity.
Logging – all transactions can be logged within S3 itself.
Notifications – changes to data in S3 can be used to generate alerts.
Replication – data can be replicated between AWS locations.
Encryption – data is encrypted in flight and can be optionally encrypted at rest using either system or user generated keys.
Billing – service charges are based on capacity of data stored and data accessed.

Due to it’s longevity in the market and maturity of features, the S3 API has become the ‘de facto’ standard for object-based storage interfaces. In addition to their own proprietary APIs, pretty much every object storage vendor in the market place supports S3 in some form or other. Having support for S3 provides a number of benefits:

Standardisation – users/customers that have already written for S3 can use an on-premises object store simply by changing the object location in the URL (assuming security configurations are consistent). All of their existing code should work with little or no modification.
Maturity – S3 offers a wealth of features (as already discussed) that cover pretty much every feature needed in an object store. Obviously there are some gaps (including object locking, full consistency and bucket nesting), which could be implemented as a superset by object storage vendors.
Knowledge – end users who are looking to deploy object stores don’t have to go to the market and acquire specific platform skills. Instead they can use resources that are already familiar with S3, whether they are individuals or companies.

S3 Compatibility

The current S3 API Developer Guide runs to 625 pages and has updates monthly, so vendors’ claims of compatibility could mean many things. Both Eucalyptus and SwiftStack claim S3 API support, however looking at the specific feature support we see many gaps, especially around bucket-related features and object-based ACLs (rather an important security requirement). When establishing security credentials, AWS currently uses two versions for signing (v2 and v4), each of which provide slightly different functionality (such as being able to verify the identity of the requestor). We will go into the specifics of support in future posts.

As well as features/functionality, there are questions of compatibility in terms of performance and the way in which the S3 interface is implemented. Some vendors will translate S3 API calls into their own native API, rather than processing them directly. This can lead to performance issues where on-premises object stores don’t behave and respond with the same error codes or response levels expected when using S3 directly.

Summary

The S3 API is the standard way in which data is stored and retrieved by object stores. Mature S3 support provides end users with significant benefits around compatibility and simplicity. In this series of posts we will dig deeper into the S3 API, including a look at the security and policy features, some of the advanced functionality and how S3 is supported across the wider industry.

S3 Described

S3 Compatibility

Summary

Further Reading