The Impact of Supply Chain on a Hybrid Cloud Storage Strategy

Over the last few years, the IT supply chain has been a source of constant concern, resulting in component shortages for the industry. In this article, we look at the supply chain, how product development works and why a hybrid approach can be used to mitigate the impact of supply chain challenges.

What is the Supply Chain?

As a simple definition, the supply chain is the network of individuals, organisations, resources, technology, and processes involved in the creation and sale of products to customers. Almost everything we consume, from food to household goods and electronics, all have some degree of supply chain process. In the technology space, there are supply chains for hardware systems and individual components such as processors, DRAM, storage, and peripherals. This also applies to software, where complex applications depend on operating systems and tools. Figure 1 shows the steps in the supply chain process. We’ll come back and discuss this in more detail in a moment.

Storage vendors build solutions based on components, including those we’ve mentioned already, while also including servers with power supplies, network adaptors, and some specialist components. The design, build, and delivery of storage systems is itself a supply chain, with all the challenges that come with it.

Product Management

Before diving deeper into the supply chain discussion, let’s look at how vendors develop platforms like storage appliances. Figure 2 shows a typical product development cycle. The vendor does research on new component capabilities that are used to decide on building a new platform. It’s more likely with large, established vendors that an existing relationship is in place that provides the vendor details on new products and early access to hardware, for example, the next generation of processors.

The vendor designs and builds prototypes before creating the final product. At that point (or before), the vendor needs to put in place sourcing arrangements for the components based on expected new product demand. Solutions are then built, stocked, and shipped to customers, either directly or through resellers. At some point, the product is sunset and decommissioned.

The decommissioning process is likely to be more complex than simply sunsetting the product and halting sales. Vendors need to offer support for an agreed period (usually years) after the EOL (end of life) stage, which of course, means continued access to components through the supply chain. This requirement may, in itself, impact the life of a storage platform if parts become increasingly hard to source.

Sub Assembly

Modern storage systems are built from commodity components that typically have a specific lifecycle. For example, storage vendors could currently be in the middle of a transition from Intel Skylake to Ice Lake processors, gaining the additional benefits of PCIe 4.0. In tandem, this offers the ability to upgrade storage to faster interface speeds. Vendors have also incrementally moved from SLC through MLC, TLC and now QLC SSDs.

The development of new systems will be intrinsically linked to the refresh cycles of core components, mainly processors and bus architectures, as all the remaining parts tend to follow the same architectural changes.

Supply Chain Challenges

Let’s go back and look at how the supply chain works for on-premises storage platforms. In this instance, the raw materials will be components, as already discussed. Vendors typically source components from multiple suppliers, both to manage costs and to guarantee consistency of supply. The Thailand floods demonstrated the need for numerous supply lines when HDD supplies dried up. This kind of supply chain highlights one problem that also occurred during the pandemic, and that’s where supply chains converge onto a single set of suppliers. Mitigating this issue can be challenging or almost impossible without buying commitments or pre-existing agreements.

Most storage vendors today are more assemblers than manufacturers. Some vendors have outsourced the hardware process to third-party suppliers. Whether in-house or elsewhere, the assembly process produces a finished product that is then delivered to customers directly or through VARs and distributors.

Old for New

One aspect to consider in the design cycle is the impact of the introduction of new hardware. Nobody likes to buy last year’s product, unless at a discount. This causes vendors to offer lower prices for solutions that will ultimately quickly become legacy as the new solution is introduced. Ways to mitigate this problem include separating software from hardware upgrades and designing componentised architectures that make the replacement of drives and controllers in place easier to achieve. Vendors can also offer more attractive financial models to refresh hardware over time. This process is part of the move to SaaS for on-premises hardware.

Control

Although we’ve highlighted some supply chain challenges, storage vendors are generally very good at designing, building, and bringing new products to market. Most will have strong relationships with their suppliers that enable new hardware to be introduced in a timely fashion. However, any hiccup back down the line (like a delay in creating the next processor design) will impact new products. This issue is generally more visible to on-premises customers that care more about specific hardware speeds and feeds. The legacy upgrade cycle of 3 to 4-year replacements also drives this behaviour because financial models are based on leasing or purchases and buy-backs where replacing the entire hardware stack is engineered to be more cost-effective than continued in-place upgrades.

Supply Chain in the Public Cloud

Let’s compare the on-premises process to that in the public cloud. Storage in the public cloud is delivered natively by the hyper-scaler or through 3^rd party relationships with vendors such as NetApp. Some vendors offer their products as virtual instances (also known as virtual storage appliances). In all these cases, storage is delivered using the same hardware used for applications (although the original native solutions may be more complex in their implementation).

Virtual instances in the public cloud are generally grouped in two ways; firstly, by use-case (general-purpose, storage optimised, memory optimised) and second by generation. For example, AWS EC2 M6i instances are based on 3^rd generation Intel Xeon Scalable processors, whereas M5 instances use Intel Xeon Platinum (either 1^st or 2^nd generation).

Each generation increments the processing, memory, storage, and networking capabilities of the family. Customers get to choose between the latest solutions or lower-cost older generations (for which the hyper-scaler has already amortised the investment).

Economies of Scale

Hyper-scalers can offer ongoing instance improvements for two reasons.

They have economies of scale that support new hardware because (generally) their businesses continue to grow, and hardware is spread across many customers.
Many services are abstracted behind virtual instances where the customer has no visibility (or interest) in the capabilities of the instance being used. Any instance type could be used on services where the cost/efficiency model continues to work. (Note that this aspect is important further into our discussion, as we highlight which products will work most effectively in the public cloud).

Eventually, of course, hardware does get decommissioned and removed. Customers must upgrade to a current instance. The critical ability the hyper-scalers have developed is to abstract the specifics of the hardware and the migration process to the extent that customers pick from a catalogue of choices. Pricing can then be used to bias customers towards newer instances when hardware needs to be replaced, whereas new instances might be initially priced more expensive as new hardware is gradually deployed into the hyper-scaler data centre.

Instances for Storage

What does this have to do with the delivery of storage services? For native services, the hyper-scaler doesn’t have to specify any underlying hardware details. The customer buys based on service levels, not on hardware capability. Essentially, the storage software and services are disaggregated from the hardware. Both can be upgraded independently and clearly are.

As an example, look at AWS EBS (Elastic Block Store). In 2021, AWS introduced io2 and io2 Block Express, both improvements in performance and throughput compared to io1 and the first “new” EBS service for eight years. In reality, io1 was improved over time, increasing IOPS and throughput. That service offering wasn’t stagnant for eight years, instead, AWS changed the offering behind the scenes.

At introduction, io2 was only available for newer instances and is a re-architecture, whereas io1 was enhanced through internal improvements. We know this is the case, as AWS has publicly commented that io2 uses a new storage fabric with Nitro to offload storage functionality.

Global Storage

Economy of scale and efficient service design means hyper-scalers can mitigate many of the challenges in the supply chain. However, these capabilities aren’t the only techniques being used in the public cloud. As we highlighted in 2019, AWS has moved from price reductions in storage to adding new features that offer lower-cost tiers, reduced functionality or reduced resiliency. Much of this capability is automated behind rules set by the customer. Tiering, and the pricing behind it, gives the cloud hyper-scalers yet another lever to use in controlling customer behaviour and one that can be aligned to the cost of products and supply-chain management.

Exploiting the Hybrid Model

The global nature of IT means that issues in the supply chain are always a challenge that requires awareness and management. A hybrid storage model provides end-users and businesses with the ability to take advantage of the mitigations implemented by the hyper-scalers while retaining the choice of on-premises or public cloud deployments. This means being able to:

Mitigate on-premises lead times by moving new or existing workloads to the public cloud (and not delaying existing projects).
Optimise costs by picking the right location for application data.
Smooth out the refresh of on-premises equipment without being forced into refresh by unpredictable on-premises demand.
Quickly take advantage of public cloud price reductions and automation features.
Quickly take advantage of new products and services in the public cloud by moving data in or out on-demand and with relative ease.

The Architect’s View®

Although we highlighted block storage in this article, the modern cloud offers multi-protocol support for block, file, and object workloads. Typically, block storage is closely coupled to virtual instances, whereas file and object are more universal. The adoption of S3 as a de facto object storage standard and web-based protocols makes it easy to move data in and out of the public cloud. File-based storage has similar flexibility. However, hyper-scalers have recognised that “vanilla” implementations of file protocols like NFS aren’t enough and so worked with vendors to deliver specific platform solutions. We can see this, for example, with the current range of FSx solutions from AWS. These offerings have the benefit of full integration into the ecosystem, making them “first-class citizens” in APIs and security models.

With this in mind, IT organisations can now build hybrid architectures spanning private and public clouds, implementing application and data portability. These designs aren’t about trying to create applications that must span clouds but instead, address cost, availability and supply chain challenges in a way that delivers more certainty to the business.