In the days of spinning media, I/O performance was anything but consistent. Response times can vary enormously when mechanical head movement is involved. Actually, the variation of I/O isn’t just to do with mechanics as we will discuss in a moment. However, inconsistent I/O response time is a problem for applications and as tolerances reduce, keeping that consistency is going to get increasingly harder.
As we discuss consistent I/O performance, let’s look at hard drives first. The read/write head on an HDD has to continually move to the whatever track is being written and read. This is known as seek time and could vary significantly, depending on how random data access to the drive is. HDDs also have rotational latency – the time taken to wait for the specific part of a track to come around to the read/write head. With constant angular velocity (a fixed rotational speed) we can work out the average latency. A 15K drive can do a full rotation in 4ms, so the average time for the head to come around will be half that at 2ms (randomly, the amount of movement could be anywhere between 0-4ms).
On average, hard drives are relatively consistent, but for each individual I/O the response time could be very different indeed. Hard drives also have lots of error recovery features that attempt to fix transient I/O errors. These are the kinds of problems that cause tail latency – something the hyper-scale cloud service providers want to desperately avoid. We talk about this problem with Mark Carlson on this week’s Storage Unpacked podcast.
There’s also a third issue for hard drives – recovery. With a single queue for I/O, rebuilding a RAID group in the event of failure can have massive performance implications for a busy hard drive. The rebuild task effectively competes for access with normal I/O and either takes forever or impacts performance. As drives get bigger, the rebuild issue is exacerbated.
- Building Private Cloud Storage – HCI or Dedicated Array?
- Avoiding the Storage Performance Bathtub Curve
- Modern Storage Architectures: Datrium
Naturally, we try and mitigate these issues with caching and other I/O management techniques as part of shared storage.
It’s worth mentioning hybrid storage arrays at this point – where flash accelerates performance, then I/O can be fast – but any fallback to disk risks I/O dropping back from micro to milliseconds and ruining consistency. Naturally, hybrid vendors work hard to ensure that back-end disk is rarely accessed (with good reason).
Although flash is faster than disk, the media still has consistency issues. Flash is accessed in pages and re-written in blocks, which are made up of multiple pages. In order to perform garbage collection and manage wear levelling, SSDs can periodically deliver much worse performance than their documented capability. We also talk about this problem on this week’s podcast. Hyper-scalers want to have drives where the garbage collection processes can be managed at the host level. This would allow SSDs to be put into a management mode only when the drive wasn’t needed.
Again, in shared storage systems, inconsistent flash performance is mitigated by a range of processes, including doing reads from RAID rather than waiting for a slow drive.
If average performance is good, why care about the odd outlier in response time? Well, it really depends on the application. Hyper-scalers care because it affects page load time or their ability to display search results. Today’s website ads, for example, are selected dynamically as a page is being generated, so being able to consistently select and display the right advertisement is important. In the enterprise, the same logic applies, whether that’s in financial trading, online banking or e-commerce. Consistency is key and a few poor quality I/O responses can be the difference between making that sale or getting a lower-priced trade.
In machine learning/AI applications, the learning process may run for days or weeks, so having consistent, low latency is essential. This consistency needs to be exhibited over a long period, not just the initial period as storage cache fills up.
Deterministic Storage Performance
Storage and compute solutions work hard to get the most consistent I/O performance for applications. Some examples include:
- Pure Storage – The company has developed a hardware architecture that works directly with NAND flash, cutting out the traditional SSD controller and allowing a more consistent performance experience.
- Violin Memory – The original VIMMs created by Violin allowed a more consistent I/O response time, especially in the face of rebuilds or device recovery.
- Vexata – The VX-100 platform uses FPGAs and Enterprise Storage Modules to separate control and data planes, adding only around 5-10µs to end-to-end I/O.
- Datrium – the DVX solution separates performance and capacity storage, creating a consistent I/O experience, especially in the case of node failures.
- NetApp HCI – a slightly different implementation of HCI, with a shared distributed storage layer and scale-out compute.
The Architect’s View
It’s interesting to see vendors focusing on their hardware solutions as a means of gaining predictable performance. Whilst this list is by no means exclusive, without having an understanding of the way in which hardware performs, then maintaining predictable latency becomes a problem. As we move down towards the sub-100µs level for media response times, I have a feeling we will see more focus on shared storage, either with new solutions that again, go back to their hardware roots or with modified HCI. Traditional HCI was a great way to move applications to a better operational model, but in all storage solutions that use all-flash and eventually storage class media, the efficiency of the data path will be a critical success factor.
Comments are always welcome; please read our Comments Policy. If you have any related links of interest, please feel free to add them as a comment for consideration.
Copyright (c) 2009-2018 – Post #2444 – Chris M Evans, first published on https://blog.architecting.it, do not reproduce without permission.