Is there a future for Software Composable Infrastructure?

This week Microsoft finally confirmed the acquisition of Fungible, which had been rumoured for some time. This transition leaves Liqid as the remaining independent SCI vendor in the market. Does this mean it’s all over for Software Composable Infrastructure?

Background

The concept of software-composable infrastructure has been around for many years. We’ve been discussing the technology since HPE announced Synergy in late 2015 (see this post). It’s interesting to look back and see that one original premise for the technology was to eliminate “zombie” servers in the data centre, reclaiming power and making resources more efficient. This issue resonates today in a world where sustainability and efficiency will be themes for 2023.

We tried to put a framework around the SCI concept in a follow-up post and even joked about the mainframe being the first composable infrastructure solution. In one respect, though, we were being earnest in this article, showing that over time, current infrastructure has become less composable and more rigid in design.

For some additional SCI background, I recommend reading this post we wrote in 2019, which breaks down the concept of composable infrastructure and discusses how it could evolve as new technologies come to market (which we’ll discuss more in a moment).

Modern Composability

Fast forward to 2023, and we can see how the idea of composability has evolved. Fast Ethernet networks and the evolution of PCIe provide the backbone for connecting the traditional components of a modern server. This blog post we wrote in 2019 shows how Liqid uses this backbone concept to deliver “virtual” servers that couldn’t be built as traditional physical devices. For example, GPUs can be added (or removed) from servers dynamically and provide greater connectivity than could be achieved in a standard server.

The technology also leads to efficiencies. For example, scaling GPUs in a traditional server may require using multi-socket systems, even if the second processor isn’t present or underutilised. In some cases, an accelerator might be needed for only a few hours per day. Leaving a GPU permanently deployed is an expensive overhead, especially if it can be repurposed elsewhere when not in use.

So, why isn’t there more composability in the enterprise?

Don’t Touch!

In the public cloud, we’re used to the concept of building up and tearing down infrastructure on demand. Hyper-scalers built their platforms on the premise that the main benefit of cloud computing is this dynamic and flexible nature. Unfortunately, these concepts don’t always transfer to the enterprise data centre. There are several reasons for this.

If it ain’t broke, don’t fix it is a classic mantra for enterprise data centres. Once a server and application are working, IT departments wrap change control around any modification processes. Change introduces risk, and that’s not well managed in enterprise IT.
Demand. In the public cloud, thousands (or hundreds of thousands) of customers are working independently on projects that will comprise some fixed workloads and some that are dynamic. We might see more change happening in the development process, while some scalability exists in production, but perhaps only when an application sees increased demand. The typical enterprise may not have the requirement for this degree of flexibility if there are only a few dozen business units developing software.
Skills. Many IT departments still spend much of their time “keeping the lights on” and supporting existing applications and processes. Dynamic data centres require new skills (and technologies) to make the flexibility of composability work well. Introducing new tech (and process) isn’t easy and generally needs to deliver significant improvements to be justified.
Process challenges. Many IT organisations still don’t implement chargeback, so deploying dynamic infrastructure is a world away from the capability of these companies to evolve and consume. Composable infrastructure requires rethinking the billing process, change control, monitoring, budgeting, workflow, and other aspects of infrastructure management that make the concept practical and usable. That’s not to say SCI couldn’t be used tactically (for example, where one team recomposes its own infrastructure), but widespread adoption across the entire enterprise is a more significant step than just infrastructure deployment.

Enterprises need to be faster to change, which is definitely one aspect here. There’s also another, and that’s timing.

Timing

The difference between expectation and reality can sometimes lead to disappointment. Composability sounds great in theory, but the process is constrained by server architecture. Today, SCI systems can compose storage, networking, GPUs, and other devices on the PCIe bus (including CXL-based memory). It can’t compose DRAM because the inherent design (and performance requirements) of system memory make it impossible to move DIMM slots too far from the processor.

This means each composable server must be constructed from a baseline server design. The characteristics of this server (for example, memory footprint or PCIe lanes) dictate the degree of composability. A data centre would need to be seeded with a mix of server configurations, onto which composable devices are added. This constraint immediately restricts what SCI can achieve, or incresaes costs with unused hardware.

However, with the introduction of Intel Sapphire Rapids and AMD EPYC Zen 4 architectures, PCI Express 5.0 enables CXL and memory extension. Further CXL enhancements will provide the capability to move out-of-box with PCIe/CXL fabrics. At this point, some system memory becomes composable. Future server designs could implement a minimum on-board DRAM capacity and expand memory dynamically using CXL.

So, perhaps SCI is a little early for the market. CXL could be the inflection point needed to make SCI a practical reality for more on-premises data centres.

Side Note: In this article we haven’t considered the impact of DPUs to enhance composability. This is a separate area for discussion.

Missed Opportunity

Despite the reluctance to adopt SCI in the enterprise, existing server vendors have an opportunity to use the technology for “as a service” infrastructure. Dell (with APEX) and HPE (with GreenLake) both offer servers on-demand. It seems a small step forward to use composability on-premises to deliver more flexible infrastructure to the customer without having to significantly over-provision. We’re aware that Liqid already has a partnership with Dell, but this doesn’t seem to be at the forefront of Dell’s data centre strategy.

Dell, HPE, Supermicro, Cisco, Lenovo, IBM and perhaps NVIDIA all need the on-premises data centre to remain a practical solution. Rather than wait, we believe one or more of these companies could gain a competitive position by acquiring the SCI vendors now and developing CXL-enabled systems for the future. That opportunity has been partially missed, as Fungible is now under Microsoft’s control.

The Architect’s View®

The hyper-scalers are prepared to invest aggressively for the future. For Microsoft, Fungible may not directly result in a product that will be sold to customers but instead will be assimilated into Azure to make the public cloud more dynamic and flexible for customers. AWS acquired E8 Storage (which we think became io2 Express), while Google acquired Elastifile (which we believe powers FileStore High Scale and Enterprise). In fact, AWS moved early with the acquisition of Annapurna Labs, seeing the benefit of custom silicon. We’ve already seen where Apple is headed in that respect.

On-premises vendors need to find a way to counter the public cloud growth challenge. This means radical steps, which we don’t yet see happening. However, businesses that still want to use on-premises data centres need to accept that cloud-level flexibility will not be achieved using current technology. While the status quo persists, the public cloud will continue to gain market share unabated.