Revisiting Seagate Kinetic Drives

Around 2.5 years ago, I blogged about the release of new object-based hard disk drives from Seagate called Kinetic. The idea is that the drives don’t use traditional storage protocols like SAS and SATA, but instead store objects written and retrieved over Ethernet. Effectively each drive is a large key-value store that manages it’s own content. You PUT an object to a drive and retrieve an object ID; at some point in the future you can then GET that object back with the ID, or choose to DELETE it.

The idea of using Kinetic drives within object stores was raised at the recent A3 Communications Technology Live! event that was held in London in March 2016. Some of the discussion focused on the ability to push more intelligence down to the drive, especially with the continuing advances in performance and capabilities of controller processors like ARM’s Cortex range. This idea is potentially a benefit for Object Storage vendors where low level functions like data protection could be pushed to drive level.

Anatomy of an I/O

Let’s dig down in more detail and look at what happens when an application issues an I/O request to disk on a typical storage device. HDDs and SSDs use the idea of LBA or logical block addressing as a way to read and write content onto the storage media. LBA is a fixed-block architecture that abstracts the underlying hardware of the drive (e.g. platters, cylinders and tracks) providing a linear “address space” that stores and retrieves data using a single addressing scheme. Previously data had to be written to disks using a methodology that mapped to the architecture of the device; namely CHS or cylinder-head-sector. CHS requires more knowledge of the device itself and as drives can have different capacities and designs, so more effort would be needed to write data consistently to CHS devices.

Without some kind of mapping process, HDDs are as dumb as Kinetic drives in that they only store fixed pieces of content, with no understanding of what that content means. Intelligence is brought in through the use of a file system that formats a drive and places an index on it. The filesystem provides the translation between logical content (e.g. a file) and the blocks on disk from which it is built. Lose the file system index and you’ve lost the map to your data (although modern file systems store more than one index copy and as the file layouts are well known, tools can recover data from a drive with no index).

LBA addressing can be manipulated to ensure data is written to the best performing part of the disk (the outside) and to ensure that head movements are minimised. Storage appliances do this as a matter of course; tools are available for desktops to optimise data layout and to defragment (re-organise) files to reduce these overheads. Bear in mind of course as we move to SSDs, most of this effort becomes pointless as the location of data doesn’t directly affect performance. So with HDDs, performance can be influenced to ensure that data on the file system is read and written optimally.

Black Box

The initial implementations of Kinetic appeared to treat the drive as a black box. The drive itself would determine where data was placed, including any housekeeping or other work needed to optimise the space utilised. Imagine as variable sized objects are continually written, read and deleted from a drive, then there’s the risk of fragmentation developing (just like in a traditional file system). So looking at the initial uses of Kinetic, the logical features of storing and retrieving objects rather than blocks of data does seem appealing, however the mechanical aspects of the drive could have an adverse impact on performance, as the drive itself does that work and hides (or appears to hide) that detail from the user.

The Wisdom of the Crowd

The mistake perhaps made when looking at Kinetic is to think of a single drive in isolation. A single drive provides no real benefit over a traditional HDD and in fact abstracts away some of the important information needed to make that drive work effficiently. However, consider a large number of drives operating together in a collective. In this instance we have more locations available to write data to. So when one drive is performing garbage collection it could (for example) mark itself unavailable for writes, while redirecting reads to a mirror of the required data. Drives could communicate with each other and implement mirroring at the drive level, removing the need to build protection into the application or storage appliance layer. Drives could also “self heal” or at least force data rebuilds if SMART data indicates the drive is about to fail. All of this can be done with the intelligence built into each drive and offloading the work from the storage appliance/application.

These ideas are the potential of Kinetic, when operating as a large group of disks. It’s the vision that I think Seagate wanted to achieve and some of these ideas are listed on their Kinetic Vision page (link here).

Getting back to our original discussion at Technology Live, the discussion in the room focused on how the controller in Kinetic drives could be harnessed to add even more features, especially to allow an object storage platform to delegate functionality to the drives and operate at a much more lightweight level, simply providing high-level functions like permissions and credentials validation. Kinetic drives could for instance run their own virus scanning, content discovery, de-duplication and compression functions. None of these need the direct input of the storage appliance to achieve. The drives could also allow arbitrary code to run on them, subject to working within an API that allows that code to access content on the drive.

The appliance simply becomes the intermediary for certain out-of-band functions; if the application is storing and retrieving objects, then the storage appliance needs only track which drive is storing each object. This potentially may also not be necessary if the drive ID is embedded in the object ID; the application could have code to decipher the ID and talk directly to the correct drive.

The Architect’s View

There’s definitely some future for Kinetic drives, if functionality can continue to be devolved to the drive itself. However a lot of what I read on Seagate’s website still focuses on hardware-centric power and capital expenditure savings. It’s also disappointing to see that the original 4TB drives are still the only model available, despite the latest drives from Seagate offering double that capacity. There may be a compatibility issue with the Kinetic technology and SMR.

Where is Kinetic headed? Currently the Kinetic Open Storage Project lists many of the object storage vendors as members, including Toshiba and Western Digital. There is clearly work being done, but there’s not much activity or many updates to the site wiki. Perhaps this project is a slow burn and we’ll see some new information emerge in 2016.

Are you involved in the project? Do you have a view? We’d love to hear your thoughts; you can comment or contact us directly.

Anatomy of an I/O

Black Box

The Wisdom of the Crowd

The Architect’s View

Further Reading