Zesty Optimises AWS EC2 EBS Storage

Last week at KubeCon Amsterdam 2023, I had a conversation with the folks at a start-up called Zesty. The company has a solution called Zesty Disk, which optimises the use of Amazon Web Services EC2 block storage.

Background

AWS EC2 block storage (EBS) is used for boot and data disks on virtual instances. Getting the configuration of EBS volumes right can be tricky. There are many performance types to choose from, each with a different cost profile. Unlike on-premises or even virtual server storage, charging for EBS volumes is based on the provisioned (not consumed) capacity. Volumes can easily be extended but not reduced in capacity without a reconfiguration of the virtual instance.

Thin Provisioning

In a world with thin provisioned storage (such as that on-premises or deployed with VMware), getting sizing right for virtual volumes isn’t too difficult. Thin provisioning enables storage to be logically allocated to the maximum size expected for a virtual instance (or physical server) during its lifetime. As usage increases, the physical utilisation of the backing storage increases. As long as the virtual file system supports either TRIM or UNMAP commands, reclaiming released storage is easy, although a degree of file system management is needed. Thin provisioning is an intelligent way of optimising physical resources, especially with file systems that operate best when running at much less than 100% utilisation.

Full Fat

In storage systems that have no thin provisioning capability, capacity management must be more closely managed. Thankfully, both Linux and Windows operating systems are capable of file system extension if the physical space is available on the logical volume underpinning it. It’s also possible to shrink a volume if the free space is aggregated at the end of the physical volume.

Both shrinking and extending volumes carry a degree of risk, so the work is usually done at quiet times or during periods of maintenance. Common sense suggests taking a backup beforehand – just in case things go wrong.

Volume Management

The ability to abstract physical LUNs at a server or host level has been around for over two decades. Logical volume managers (such as Veritas Volume Manager or the native LVM features built into Linux and Windows) provide the capability to pool physical storage and either subdivide or aggregate physical LUNs into logical volumes onto which a file system can be applied.

The features of logical volume managers vary greatly. Veritas VxVM was successful as a commercial solution for many years on systems that already had a volume manager because the features offered were superior to the native solutions.

Zesty Disk

What does all this have to do with volumes in the public cloud? In AWS EC2, all EBS volumes are charged at the provisioned capacity, not the used capacity. The concept of allocating a thin provisioned volume with a much larger capacity than needed doesn’t exist. As a result, AWS end users need to think more about the way storage is allocated. Every unused gigabyte of io2 storage, for example, is wasting $0.125/month. This may not seem a lot, but the overhead could quickly run into thousands of wasted dollars every year.

EBS volumes can be expanded but not reduced. Shrinking the capacity of storage used on an EBS instance requires the use of snapshots and a potential outage to the virtual server. It should also be noted that EBS capacity and performance changes are not instant and can take up to 24 hours to be actioned.

Zesty Disk helps customers solve the wastage problem by implementing the features of a logical volume manager on each EC2 instance. This process is achieved using the btrfs file system, which incorporates the capability for volume expansion and shrinking, as well as dynamic block device addition and removal.

Rather than add a single, monolithic volume, Zesty creates multiple, smaller volumes and then aggregates them at the host using btrfs.

Automation

Of course, EC2 users could also take this approach and manually implement the logical volume manager features on each deployed EC2 instance. However, there’s a significant effort involved in managing just one instance and rebalancing for capacity and performance requirements. Manual management just doesn’t scale.

Zesty provides the features to constantly monitor EBS volumes, automatically shrinking and expanding where necessary, based on a predefined threshold and long-term trend data on capacity utilisation.

Smarts

The intelligent part of the Zesty solution is the way in which data is collected and analysed from EC2 instances. However, we see some caveats in this system.

Firstly, there is a minimum and maximum capacity size for any EBS volume (although that isn’t likely to be too much of a restriction). Second, the number of EBS volumes that can be attached to an EC2 instance varies significantly from 3 (d3.8xlarge) to 31 (bare metal). Most instance types support 27 or 28 volumes. As the utilisation of an EBS volume increases, the skill of automation is to predict how best to restructure a volume over time without incurring additional backend (and effectively wasted) I/O.

Then there’s the issue of performance. Each EBS volume has an independent performance profile, so consolidating multiple, small volumes into a larger one would reduce overall bandwidth. Zesty Disk needs to consider this factor.

Next, there’s the issue of resiliency. Each EBS volume is a separate failure domain, so a RAID-0 “stripe” of many EBS volumes has a slightly increased risk failure profile unless additional RAID protection is added through btrfs.

Finally, there needs to be consideration for data protection. Many AWS users simply implement snapshots as their data protection mechanism. Snapshots taken at the storage layer for multiple logical EBS volumes are unlikely to be consistent because each volume could be handling I/O from the host in parallel.

The Architect’s View®

Zesty Disk initially reminded me of the primary and secondary extent allocation system used on mainframe systems. However, the solution is so much more than a simple reallocation process. The value here is the capability to predict utilisation over time and restructure with the minimum additional overhead.

It’s possible that AWS could decide to move to a thin provisioned model for EBS storage. However, there are lots of performance and management implications to this (unless thin provisioning is already being used behind the scenes). The greatest issue, though, is probably the cost implication for AWS, which makes a lot of money on unused capacity that could be significant if thin provisioning is already being used.

If you’re using EBS volumes, Zesty Disk may be worth a look; it might just save you money.