Object Storage Performance - Your Mileage May Vary

Object storage performance is becoming increasingly crucial as unstructured data stores offer a new way to address the processing requirements of modern applications like AI and analytics. However, throughput figures received in real-world scenarios may not always match up to the claims from your storage vendor. We look at why this is the case and how IT organisations can evaluate vendor-provided data.

Object storage has long been seen as a solution for the long-term retention of large volumes of archive or inactive data. Object stores represent a cost-conscious platform for data that is generally streamed or accessed sequentially. Backup data is an excellent example of this kind of workload.

Increasingly we see object stores used for high-performance workloads. New solutions such as AI and analytics require access to data with high throughput, a degree of randomness, and high parallelism. This profile can also be seen with more traditional applications such as CDNs (Content Data Networks), where multiple parallel streaming is the norm.

For many applications, scale-out file systems have been the typical way to address performance needs for unstructured data. As IT organisations have become more comfortable with REST API programming, the benefits of a more straightforward interface and fewer constraints on system scaling have made object stores a much more attractive solution.

Real World vs Lab

Performance benchmarks have been around for decades. Storage performance metrics generally cover block-based storage and file systems, with little coverage for object stores (more on this later). It’s easy to see why this scenario has developed, as block-based storage platforms are generally used for managing traditional OLTP and transactional applications that are latency-sensitive.

As we’ve discussed before, benchmark figures have to be taken with a pinch of salt. The process of undertaking an industry recognised benchmark is both lengthy and costly (potentially $500,000 or more). As a result, vendors want to see their products displayed in the best light, and if possible, be close to the top of the benchmarking chart. This leads to a degree of gaming the system and can cause misleading results.

Inevitably, the results seen in the lab won’t reflect what the customer experiences in the data centre. Why is that? We can use a motoring analogy to explain how the differences occur.

Your Mileage May Vary

Drivers typically look at performance (speed, acceleration), emissions (CO₂) and efficiency (miles per gallon or litres per 100 km) when comparing car models. Emissions testing in Europe is governed by strict testing and minimum standards that are currently documented under “Euro 6”, the sixth standard in use since 1992 and the most stringent. Euro 6 is gradually being replaced by the more comprehensive lab-based WLTP (Worldwide Harmonised Light Vehicle Test Procedure) and an accompanying “real-world” test, RDE (Real Driving Emissions). All new vehicles must provide emissions and fuel economy data, whereas acceleration and top speed are not scientifically measured.

Lab testing for emissions and fuel economy is necessary because it establishes a consistent environmental baseline under which the testing is carried out.

Each vehicle is configured based on the manufacturer’s specification and tested under identical temperature-controlled conditions. This process makes it possible for prospective purchasers to make “like-for-like” comparisons when buying a new vehicle.

The introduction of RDE is important as it provides a more real-world view of emissions testing, as a vehicle is driven through different environments (inclines, varying temperature, altitude, in urban settings, on motorways and with changing traffic conditions). RDE is less likely to be practical when comparing vehicles with each other, as the conditions on the day will vary from test to test. However, RDE enables a comparison of lab-based to real-world results for each vehicle.

Testing Storage

When evaluating storage performance, we can see how the motoring analogy works. Current “industry standard” tests should provide a like-for-like benchmark comparison. Unlike cars, simply taking an appliance-based storage solution and running a series of performance benchmarks wouldn’t offer a fair comparison because vendors build systems from varying component types and deliver solutions in a range of configurations that are rarely the same.

As a result, the storage testing regimes available today attempt to level the playing field by normalising the use of technology with a financial metric that results in a “price/performance” calculation. Without this additional factor, storage performance testing in some scenarios would be like comparing a Citroen C1 with the Bugatti Chiron. With normalisation, some of the highest performers can be the most expensive systems and not always the best value for money.

Real-World Storage

As we deploy object storage solutions into data centres, there are several factors that influence the difference between lab performance and real-world observations.

Synthetic Data – Performance testing generally uses load generators and synthetic testing algorithms to generate data. This results in a regular and predictable load on the test environment. In the real world, I/O profiles will be more unpredictable, with bursts of workload and spikes in demand. The peaks and troughs of I/O will also consist of a wide variety of read/write ratios and object sizes, which is not always emulated in testing. Object storage solutions will differ in performance when attempting to absorb these peaks and variations.

Pre-seeded Data – some testing processes won’t pre-load data into a test system before running a test or will load an object store with limited test data. Unlike testing block-based storage, the performance of some components such as object metadata will have a direct impact on read/write throughput and object create/delete times. In the data centre, metadata management is an essential consideration as performance could degrade over time.

Failure Modes – most testing doesn’t take into account failure scenarios from both media and nodes/servers. In large-scale object stores, there will always be some degree of failure recovery taking place, whether that’s scrubbing and rebuilding data or recovering from media failure.

Variable Configurations – deployed configurations won’t always match the lab environment. This scenario is even more likely with software-defined solutions where the hardware and software are sold/licensed separately, and customers can vary their configurations. End-user configurations may also be asymmetric, grown over time as an object store is expanded. This type of platform evolution will not reflect lab performance numbers.

Multi-tenancy – vendors may choose not to test multiple tenant workload simulation; however, many customers may have a mix of many workload types active on a single object storage cluster. This type of configuration can result in a “bathtub” I/O performance profile.

Infrastructure Constraints – the customer environment may be constrained by network bandwidth or speed compared to lab testing. This obviously has a direct impact on performance results.

Understanding Your Mileage

Performance in the real world will never be as good as in the lab. Our motoring analogy shows why but perhaps points the way to better performance testing. The RDE motoring test offers a real-world view of emissions performance, with a direct correlation to the ideal lab results.

Object storage customers can use benchmarks as the guideline for differentiation between products, while measurement of deployed systems will show actual performance. Vendors need to work with customers to benchmark systems in place, using this data to build a picture of the expected performance of an ideal configuration. For example, customers might expect, on average, to receive 90% of the performance seen in the lab. With enough data from the field, vendors should be able to confidently guarantee customer expectations compared to the lab results.

The Architect’s View™

Unfortunately, there aren’t many accepted object storage benchmarks available for vendors or customers to use. COSBench and OStoreBench seem to be generally accepted; however, there are no vendor ratings or hardware standardised configurations available. The object storage industry needs a standard that compares vendors on equivalent hardware, plus testing against vendor-recommended solutions. This type of testing is easier to do today than ever before, as most object storage solutions are software-defined. As high-performance object stores become more prevalent, performance will emerge as a differentiating feature.

In a follow-up post, we will dig further into existing benchmarks and discuss what metrics should be measured to accurately compare one vendor solution to another.