A clear focus of the recent Tech Field Day #8 event was the use of flash storage (or SSDs) for storage arrays or within hybrid solutions. Pure Storage offers an all flash storage array, which they say can be delivered at the same cost or less than traditional solutions. It’s a big claim, bearing in mind the cost per GB multiplier of flash over traditional HDDs is still pretty high. However delivering storage isn’t all about cost per GB and server workloads are changing, so perhaps we’re coming close to the point where all flash arrays are viable. This is a discussion that’s been had before with the likes of Violin Memory, so let’s dig a little deeper and see what Pure Storage has to offer.
Pure Storage was founded in 2009 by John “Coz” Colgrove and John Haynes, who have their background in Veritas and Yahoo respectively. More details on their backgrounds can be found on the bios page of Pure’s website. Other members of the team have worked in companies such as Netapp, Decru and Sun, including Michael Cornwall, who was the lead flash designer at Apple for the iPod and iPhone. So far, the company has raised around $55m from investors, the most significant of which has to be Samsung, who provide all of the flash drives used in Pure Storage’s products. We’ll touch on the relative merits or disadvantages of this later.
So what are Pure Storage offering? Well, it’s pretty simple; an all-flash storage array at a $/GB price that’s cheaper than traditional storage. Today that consists of two models, the FA-310 and the FA320. The FA-310 is a single controller with one storage shelf, providing up to 140,000 4K random write IOPS. The FA-320 doubles the storage capacity and increases write IOPS to 180,000.
Focusing on the FA-310, the controller is based on two 6-core Intel Xeon processors with 48GB of memory. Back-end connectivity is 6Gbs SAS and 40Gbs Infiniband, while front end connections are only based on Fibre Channel at this time (4x 8Gbs SFPs). Storage is provided by 22x 256GB MLC flash drives, giving a raw capacity of around 5.5TB. It’s not surprising that Fibre Channel is the only protocol available on the first models. FCoE doesn’t have the adoption rate and iSCSI wouldn’t suitable for the type of traffic this array can support. However the controller is detailed as having one spare expansion port, so we can speculate whether that is planned to be for Ethernet in the future.
The connectivity between the controller and disk is less than that offered at the front end. This may seem odd but it reflects on one of the key features of the Pure Storage arrays. Data entering the system is compressed and de-duplicated before being stored on disk, improving the overall efficiency of the array and reducing the volume of write I/O to physical media. The ability to perform data reduction before storing on media is the main way in which an acceptable price point can be met. This is something many other vendors are also doing as most customers are clearly fixated on the $/GB formula as the only way to measure acquisition cost. Pure quote a ratio of anything from 5-20x reduction and as anyone familiar with data reduction technologies will know, your mileage will vary depending on the type of data consumed.
Ultimately though, there has to be something that make Pure Storage stand out from the competition. During our Field Day visit, we were lucky enough to have a presentation from Coz, without slides, using just the whiteboard. He detailed what is probably the most important piece of Pure’s technology, and that’s the way they manage the SSDs themselves.
Solid state drives are fickle devices. Every write wears them out and much effort has been put into technologies (like wear levelling, write amplification) to extend their lifetime. MLC devices now have a much better reliability than they did a few years ago, allowing them to displace SLC in enterprise technology. Understanding how to manage SSDs is Pure Storage’s secret sauce. They work closely with the SSD manufacturers to understand the best ways to read and write from the devices in order to gain both maximum performance and maximum lifetime. At the presentation they even claimed never to have had an SSD failure, something that was met with surprise by the audience present!
Part of the SSD management involves the use of RAID-3D, technology which manages the RAID stripe distributions across the disks. RAID stripes are varied dynamically based on workload and the drive responses. This allows failing drives to be avoided, increasing their lifetime. It also means I/O response times can be made more predictable, avoiding random I/O spikes seen with individual SSDs as features like garbage collection kick in.
It makes sense to understand the best way to manage SSDs and having a relationship with the vendor of those devices certainly helps. My only concern is whether single supplier relationships are ever good, from a cost, reliability and supply perspective. Only time will tell.
So where would you deploy this kind of high performance array? I don’t think simply replacing your traditional storage with a Pure array is the right approach. One of the benefits of shared storage is that I/O demand consists of peaks and troughs, periods of high and low demand from many servers. This means it isn’t necessary to deliver 100% full I/O performance to all servers all of the time, but only to meet peak demand, which is considerably cheaper to achieve than meeting maximum demand.
This isn’t the Pure approach. Their arrays are capable of delivering 2000 IOPS for every TB of storage, even at 10:1 compression. It means that the server environment driving this storage needs to have high I/O requirements across every TB of data. Otherwise, the array is never running at peak efficiency. It could be said that if the $/GB cost is at a parity with traditional arrays, then should this matter? I think it does matter because there’s a perception that flash is an expensive technology and irrespective of the effective $/GB cost after data reduction, many customers will still focus on the raw storage and the cost of the device.
Pure Storage have done a great job in delivering a technology that brings solid state performance at an acceptable $/GB price. There are some key features (data reduction, SSD management) that make this technology really work. We have seen presentations of high I/O workload that can easily be managed by the Pure storage arrays, while continuing to deliver sub 1-millisecond responses. None of the big storage vendors have technology that delivers I/O bandwidth in a way companies such as Pure Storage can. All-flash versions of traditional arrays don’t have the added intelligence to manage SSD failures and peformance spikes. I can therefore see that very quickly one of the three letter storage companies will be looking to acquire Pure or one of their many competitors. For now, they need to focus on finding the right niche for their product, while educating customers in metrics other than $/GB.
Pure were one of the companies at TFD#8 that were well organised in providing various pieces of media. I’ve included some of them here, including a link to the entire presentation from the day. There’s also a business-card sized set of instructions that Pure claim as their user manual. It’s a fun way of demonstrating how simple their technology can be.
- Pure Storage Presentation from TFD#8
- Pure Storage Flash Array Datasheet
- Pure Storage Flash Array Manual – Front
- Pure Storage Flash Array Manual – Back
Disclaimer: I attended TFD#8 as an invited blogger. My accommodation, some transportation and most meals were paid for. I was not compensated for my time, nor required to blog on any of the presentations. None of my blog entries, or other postings receive any pre-approval or viewings from vendors.