Fixing the x86 Problem

Fixing the x86 Problem

Chris EvansCloud, Composable Infrastructure, Enterprise, NVMe, Tech Field Day

Despite the development of many alternatives, the dominant architecture in today’s data centre derives from the Personal Computer of the 1980s.  The x86 processor and associated framework developed from the release of the 8088 CPU and IBM 5150 PC in 1981.  As performance and efficiency become critical to the development of new infrastructure, some vendors are looking to augment x86 systems and offload core functionality including Security, Networking and Storage.

Post-Mainframe

The IBM mainframe was arguably the most popular and successful computing architecture of the 1960s through to the 1980s.  IBM hardware set the standard that was copied by many third-party organisations like EMC and StorageTek, who made plug-compatible components.  However, the mainframe was expensive to buy and operate.  IBM stumbled in the 1980s and failed to meet its generational process timeline.  System/370 was released in 1970 and replaced by System/390 in 1990 – there was no System/380.

(If you want to know why System/380 didn’t appear, Computer Wars : The Post-IBM World provides some interesting background.)

Departmental/Midrange

In the meantime, companies like Sun Microsystems were founded and developed both new architectures (SPARC in their case) and mainframe-like solutions to run their own custom operating systems.  Solaris was released in 1992, HP-UX was released in 1982, and IBM competed with itself with the development and released of AIX in 1986.  We shouldn’t forget Windows NT from 1993; Microsoft’s attempt at a data centre O/S. 

These are only the operating systems I remember using, but there were many more.  The hardware in these solutions was designed for enterprise-class workloads and had many features (like dynamic hardware configuration) that are only just coming into today’s hardware platforms.

The Rise of x86

Why did the x86 architecture become so popular?  I think that the proprietary vendor architectures of the 1990s and 2000s suffered the same fate as the mainframe.  They were expensive and difficult to deploy, with demands like reinforced flooring and 3-phase power supplies.  The x86 architecture (especially with the rise of Windows NT) was able to deliver servers at a much lower cost and with easier maintenance.  Before server virtualisation by VMware became the norm, we were used to one server, one application.

Over time, x86 has been extended and improved.  Moore’s Law has dictated a move to multi-core processors to increase scaling.  Intel introduced multi-socket architectures, and VMware made them practical to use.  PCI Express has solved many of the performance challenges on the I/O bus (although not technically a bus). 

Hyper-virtualisation

All of these developments have essentially standardised the data centre on the x86 architecture, with the mainframe and other platforms relegated to niche use-cases.  In fairness, this hasn’t been a bad approach.  Intel and AMD have commoditised server hardware; VMware has made the hardware efficient, and Linux has arguably become the dominant data centre operating system – albeit with an echo of the proprietary O/S days of the 1990s.

Public Cloud

As we move into the Public Cloud era, hyper-scale service providers like AWS and Azure have again turned back towards proprietary hardware.  Amazon Web Services developed Nitro, offload components for networking, storage and security that is implemented through PCI Express cards.  Microsoft has offloaded networking with Azure SmartNICs and AccelNet.  Most recently we’ve seen Pensando release technology that offloads multiple tasks, including networking, storage and security. The company recently presented at Cloud Field Day 7. Embedded here is the presentation giving an introduction to the Pensando Distributed Services Platform.

Offload

Why offload technology in this way?  I can see several positive reasons for this transformation.  First, let’s look back at the mainframe again and see how storage and networking were implemented there.  NCP or Network Control Program was software loaded onto a network controller (like the 3745) and acted as a boundary to connected devices and the mainframe.  It formed part of the Systems Network Architecture (SNA) and host connectivity through VTAM (Virtual Telecommunications Access Method).  The NCP software and hardware connects to the mainframe processor through the channel subsystem.  Storage also used the channel subsystem that connects storage controllers to channels.  In turn, disk units connect to the storage controllers.  Through Channel Command Words (CCWs), processors direct I/O to be handled by the controller, only responding back to the CPU with an interrupt when the I/O response is ready to be serviced.

Scalability

Although configuration of the NCP and the IOCP/IOCDS on mainframes was quite cumbersome (it was primarily compiled into a static configuration until much later in the O/S evolution), it did provide the ability to offload I/O to dedicated hardware and free the processors to do useful work.  This architecture was particularly valuable when system memory and CPU was so expensive.  Of course, this is also a great scalability model.  Now the core processing, network and storage I/O are all scalable independently. 

Mellanox BlueField SmartNIC

We can see some similarities between the mainframe offload process and modern solutions.  The modern PCI Express bus is the equivalent of the mainframe channel subsystem.  Networking solutions already discussed (AWS Nitro, Azure AccelNet and Pensando Distributed Services Platform) are the equivalent of the NCP.  Nitro and DSP also provide storage offload via NVMe and NVMe-oF, acting as the equivalent of storage controllers.

Virtual

The difference between architectures of 20-30 years ago and those today is firstly in scale and performance.  In offloading functionality onto dedicated SmartNICs, we can achieve much higher throughput and low bandwidth.  As the Pensando team points out, this offload also allows much lower jitter – effectively more deterministic performance.  Reaching levels of scalability to use speeds up to 100Gb Ethernet simply won’t be productive with onboard processing. 

(Side Note: We discussed SmartNICs in a Storage Unpacked podcast, embedded here).

But there’s another essential benefit here too, and that’s in the capability effectively to virtualise compute at a hardware level.  Virtualised hardware may seem like a contradiction but think for a moment about how we reached our current architectures. 

Using Storage Area Networking, we can see how physical devices connected through storage controllers with protocols such as SCSI were virtualised as network protocols.  Fibre Channel Protocol (FCP) is probably better described as SCSI over Fibre Channel, as FC HBAs and drivers emulate a device that looks like a local SCSI drive.  NVMe-oF takes that same concept and makes remotely connected NVMe drives look like local PCIe devices. 

Centralisation

AWS Nitro and Pensando DSP both emulate NVMe devices, making it possible to move the logical devices and networking from one x86 server to another.  If a physical server fails or is replaced, it’s not necessary to move the applications elsewhere using a higher-level abstraction like VMware vCenter.  I can effectively move the logically connected storage and networking to another chassis.  In the Nitro and DSP examples, all of this programmability is centralised and managed with integrated security.

Why Bother?

How is what we’ve described an improvement over where we are today with server virtualisation and containers?  Here are a few suggestions as to why this transition is a step further than we can achieve with existing designs:

  • Reduction in CPU resources – tasks such as encryption and compression can be offloaded and run at wire speed.
  • Improved security model – networking and storage traffic can now all be managed centrally and delivered as global policies rather than individual device settings.
  • Abstraction – adaptors emulate local devices (acting as both the data and management plane) and removing the need for custom host drivers.  This has significant benefit for running bare metal applications and reducing management/testing overhead.
  • Observability – centralised management offers the ability to deliver greater insights into what storage and the network are doing.

Much of this functionality is taking back the complexity that has built up in server virtualisation while delivering the performance improvements of custom hardware. 

The Architect’s View

Where does this position the enterprise data centre?  Going forward, solutions like Pensando DSP will allow enterprises to build scalable infrastructure that continues the disaggregation of compute, networking and storage (AWS Nitro is proprietary and not sold separately).  This kind of solution won’t be used by the average IT department, but likely by big enterprise that needs efficiency and management at large scale – effectively emulating the operations of a hyperscaler. 


Copyright (c) 2007-2020 Brookend Limited. No reproduction without permission in part or whole. Post #be3c.