It’s a contentious title for a blog post, but I felt it worth writing a follow-up to both my Vulcan Cast with Marc Farley and Chris Mellor’s article in The Register. A rumour circulating last week implied that EMC was considering shutting down the XtremIO product line. The origin seemed to come from a number of separate parties (although we can’t exclude a clever viral marketing campaign from a competitor). How likely is this story and is there any basis to EMC taking this direction?
EMC acquired XtremIO for around $430m in May 2012, before any product had shipped to customers. With around $25 million in funding this was a pretty sweet deal for the XtremIO investors. After some directed availability, XtremIO 1.0 was released to General Availability (GA) in November 2013. Version 2.4 went GA in May 2014, with version 3.0 quickly being announced and making it GA in September 2014. This was the infamous release that was both a disruptive and destructive upgrade. Version 4.0 was announced at EMC World in May 2015 (bringing in revised hardware) with GA on 30 July 2015. Since then there have been only minor bug fix type releases with no big news announcements on the platform. So – no new software/hardware releases in over 12 months, with the pace of releases slowing down.
The XtremIO platform is based on X-bricks; these are dual controller and disk shelf, combined with UPS support to be a highly available unit. Systems then scale out through multiple X-bricks, with a current maximum of eight. All X-brick nodes participate in reading and writing data, with new I/O distributed algorithmically across the nodes to ensure an even spread of data. This has the benefit that all components of the cluster are involved in serving data. However it also has some negatives. First, loss of an X-brick is a system down situation. Now this is highly unlikely, but anything that takes a single brick down will cause a problem because there’s no internal X-Brick to X-Brick redundancy. Remember, as you add more nodes to a cluster without replication, then the availability is reduced by a factor of the number of nodes; a 4-brick system has half the availability of a 2-brick system, as either component could fail and there’s no redundancy. Similarly, 8-brick systems are half as reliable for the same reason.
XtremIO writes data as a 25-drive (23+2) full stripe write across all drives. This is the XDP proprietary RAID system. XDP provides very low write amplification, but does mean X-bricks are running with a fixed configuration (except the 5TB starter bricks). This is good and bad for expansion; if X-bricks could be expanded with more capacity (which they can’t), in the current model, each would have to be expanded by an entire disk shelf. To keep the cluster performance even, each X-brick would also need to be upgraded. The need for a uniform configuration and inability to expand a single X-brick makes it a problem for EMC. New drive capacities (expressed as X-brick capacities) have to exist for many years, unless of course customers are offered an entire system replacement rather than upgrade. As yet, EMC hasn’t introduced TLC drives for XtremIO. With the current design, an entire cluster would have to be built from TLC drives, unless of course XtremIO 5.0 brings in the ability to mix and match.
Just to touch again on that lack of TLC support, it does seem that the industry is again ahead of EMC, including their own products! EMC Unity supports TLC as does NetApp, Nimble, Dell SC, HPE 3PAR, Kaminario and (I believe) SolidFire. So why not keep cost competitive and use TLC in XtremIO? Are there architectural restrictions? Samsung are predicting a quick move to TLC technology, so vendors not supporting this media will be left behind in price wars.
EMC Product Portfolio
Not including the software-based solutions (like ScaleIO), EMC now has DSSD (high end performance), XtremIO, All-flash VMAX, All-flash Unity, All-flash VNX2 (although the hybrid models seem to be pushed more here) and after the acquisition by Dell there will be all-flash Dell SC. It’s an embarrassment of riches, so which platforms will remain and which will go? I can’t imagine the new Dell Technologies division will have room for six all-flash systems. DSSD sits in a specific market segment; Dell SC, Unity and VNX2 all seem to overlap, so presumably one will survive and customers will be directed to that over time. XtremIO and VMAX overlap, with the new VMAX systems (announced in February 2016) matching XtremIO for performance and exceeding XtremIO in scalability and native features. So why was VMAX-AF (my terminology) introduced? Presumably EMC has many, many customers that simply didn’t want to move from their investment in VMAX. The platform is mature, has rock-solid features like SRDF and companies invest time and effort in training staff and scripting to the platform, including operational procedures. XtremIO still has no native replication and in reality, RecoverPoint is a workaround solution. By the way, check out this post from Calvin Zito at competitor HPE, where he points out that the VMAX-AF models 450 and 850 are coincidentally named the same as the high-end 3PAR platforms 20450 and 20850…
Chris Mellor points to potential issues with XtremIO being the reason for the rumoured shutdown of the platform. Apparently issues of scalability and reliability are said to be a problem. When XtremIO 3.0 was released, we know that block size was increased from 4KB to (presumably) 8KB to cater for larger SSDs. I say “presumably” as all of the public literature was changed to say “a few kilobytes” rather than quote an exact number. It could be that the engineering change with 3.0 has increased the block size sufficiently to cater for larger drives. With the release of XtremIO 4.0, each X-brick controller was given a DRAM upgrade too. The reason this is important is because one key feature of XtremIO is the ability to keep all metadata in memory. Naturally this means there is a scalability limit on both the number of X-bricks and the capacity supported by an X-brick. The way around this is to implement some kind of “metadata swapping” process, moving some metadata to flash or secondary DRAM to get around the problem. The trade-off with this will always be in compromising performance, as reading metadata on flash will be way slower than accessing it in DRAM.
The Architect’s View
EMC are potentially in a bit of a bind with their all-flash platforms. Unity and Dell SC will meet the requirements of midrange customers; DSSD will meet high-end requirements. If (and I stress if) there are scalability and reliability issues, then moving back to VMAX may be both a defensive position and one to placate VMAX customers not wanting to move. The real problem here for EMC though is a financial one. EMC claims a billion-dollar run-rate with XtremIO (see articles here and here), making any announcement on the future (or not) of XtremIO a share-price changing event, one that EMC will not do lightly and certainly not with the impending Dell acquisition. Better instead let things drift along for another few months until the Dell transaction completes, then use a portfolio consolidation/rationalisation story to slowly phase XtremIO out.
Every month that a new XtremIO upgrade isn’t announced will add more grist to the rumour mill – of course EMC may announce a major upgrade in the next 30 days and we all have to eat some humble pie. Which way would you bet?
- As Dell and EMC move forward, is XtremIO being left in the dust? (The Vulcan Cast, retrieved 25 July 2016).
- XtremIO heading for the bin? Total BS, thunders CTO Itzik Reich (The Register, retrieved 25 July 2016).
- The Move to 3D-NAND for All-Flash Storage Vendors
- EMC VMAX All-Flash – what’s up with that? (Around The Storage Block, HPE Storage, retrieved 25 July 2016).
Comments are always welcome; please read our Comments Policy first. If you have any related links of interest, please feel free to add them as a comment for consideration.
Copyright (c) 2009-2019 – Chris M Evans, first published on https://www.architecting.it/blog, do not reproduce without permission. Post #4930.