Has S3 Become the De-Facto API Standard?

“The wonderful thing about standards is that there are so many of them to choose from”

This quote is variously attributed to Grace Hopper and others in the computer industry and in some respects it’s ironic that a statement about multiple standards is attributed to multiple authors! However, that said, it is true that technology seems to thrive on having multiple and competing standards in both the enterprise and consumer sectors. Witness the land grab we’ve seen with media (MP3, WMA, AIFF, WMV, Blu-Ray, HD-DVD), document formats, HTML (which although is supposed to be standard, appears to be interpreted differently by each browser vendor), television (PAL, SECAM, NTSC) and many other parts of our industry. Thankfully standards do get put in place that everyone can agree on, otherwise the Internet, for instance wouldn’t be what it is today.

When it comes to object storage, it seems that the industry has pretty much decided that in addition to their own proprietary protocols, the common interface everyone has to support is Amazon Web Services’ S3 API. S3 is the Simple Storage Service, a public cloud platform for storing binary or unstructured data. The API is based on representational state transfer or REST and typically (almost exclusively) delivered over HTTP, the hypertext transfer protocol which underpins the World Wide Web. So why has the S3 API become so ubiquitous? I suspect there are a number of reasons. These include:

First to market – When S3 was launched in 2006, most enterprises were familiar with object storage as “content addressable storage” through EMC’s Centera platform. Other than that, applications were niche and not widely adopted except for specific industries like High Performance Computing where those users were used to coding to and for the hardware. S3 quickly became a platform everyone could use with very little investment. That made it easy to consume and experiment with. By comparison, even today the leaders in object storage (as ranked by the major analysts) still don’t make it easy (or possible) to download and evaluate their products, even though most are software only implementations.
Documentation – following on from the previous point, S3 has always been well documented, with examples on how to run API commands. There’s a document history listing changes over the past 6-7 years that shows exactly how the API has evolved.
A Single Agenda – the S3 API was designed to fit a single agenda – that of storing and retrieving objects from S3. As such, Amazon didn’t have to design by committee and could implement the features they required and evolve from there. Contrast that with the CDMI (Cloud Data Management Interface) from SNIA. The SNIA website is difficult to navigate, the standard itself is only on the 4th published iteration in six years, while the documentation runs to 264 pages! (Note that the S3 API runs into more pages, but is infinitely more consumable, with simple examples from page 11 onwards).

Of course for the sake of balance we should look at the risks involved with supporting a de-facto, rather than a de-jure standard. Amazon are within their rights to expand and change the API in any way they choose. While taking features away may upset existing customers, there’s no requirement for the company to support features in the API that would help or assist their competitors. This means that we see scenarios developing like that with BlackPearl from Spectra Logic, where the BlackPearl API (known as DS3) is a superset of S3, with additional commands added to specifically support writing to tape media. Although this doesn’t seem like an ideal solution, we can see why Spectra Logic have done it; the downside is that customers have to be very aware of what each API does and does not support, including doing regression and version testing as each new API release comes out.

The Architect’s View

Although it isn’t ideal, having a standard that all vendors can support is a good starting point. Perhaps at some stage Amazon will choose to “donate” the S3 API to the community and push the management out to some recognised standards body. However there is nothing to indicate that Amazon have any intentions of doing so. Even if this doesn’t happen, it would be useful to have a test S3 API site that could be used to validate code without having to use production S3 (and incur the associated costs). Solutions like Fake S3 might be useful but they don’t provide the depth of testing that complex applications require. For now we’ll just have to accept that S3 and the various supported variants are partial implementations of multiple standards.

Comments are always welcome; please read our Comments Policy first. If you have any related links of interest, please feel free to add them as a comment for consideration.