Intel Under Pressure as NVIDIA Announces Grace CPU

At GTC2021, NVIDIA announced Grace, a new AI CPU architecture due in 2023. The design incorporates Arm CPU cores, NVlink with LPDDR5x memory and could put the squeeze on Intel x86 in the data centre.

Background

Current server designs place the x86 processor inline between system memory and GPUs. Figure 1 shows an image used by NVIDIA in the GTC keynote to explain this architecture. The CPU feeds system memory with external I/O while feeding data from memory to GPUs deployed within a server. In a system with four A100 GPUs (as illustrated), the bandwidth limitation focuses on the PCIe connections between the processor and each GPU. In this illustration, NVIDIA assumes PCIe 3.0 x16 (16GB) support from the processor, despite the A100 offering PCIe 4.0 (perhaps a dig at Intel Xeons). Intra-GPU connectivity through NVlink (NVIDIA’s proprietary GPU interface) provides up to 600GB/s throughput and 2039 GB/s of high-bandwidth memory (HBM).

Grace

Grace is a new processor architecture, announced by NVIDIA at GTC 2021. The design of Grace removes the PCIe bottleneck and instead uses NVlink (4^th generation, 900GB/s) in a mesh configuration to connect each processor and GPU to both HBM and LPDDR5x system memory, which from the image presented (figure 2), looks to be in a “system-in-a-package” design. This arrangement is similar to the architecture of the Apple M1 processor (although that currently uses LPDDR4x). Conceptually, the change in data flow is shown in figure 3.

The whole memory subsystem is cache-coherent, which simplifies programming. The use of Arm Neoverse cores and LPDDR5x memory creates a power-efficient system that NVIDIA claims will offer 10x better performance than today’s DGX-based systems running with x86 processors.

Figure 3 – Grace and GPUs. Courtesy of NVIDIA.

The Grace CPU isn’t expected to be available until early 2023.

Squeeze

For decades we’ve been used to Intel dominating the data centre with general-purpose x86 processors. The x86 architecture has seen significant improvements, adding, for example, virtualisation (VT-x) and vector (AVX) instructions that have driven server virtualisation and software-defined storage capabilities.

At scale, the overhead of every system component comes under scrutiny. The public cloud service providers, for example, have already introduced power-efficient Arm instances for workloads that can take advantage of parallel processing (such as containerised applications).

At the opposite end of the market, NVIDIA aims to resolve the challenges of scaling AI by eliminating the x86 bottleneck using Grace. A general-purpose CPU can’t compete in these high data bandwidth architectures and the models that exploit them. Data throughput from system memory to GPU has become the architectural critical path.

x86

Does the announcement of Grace represent the end for x86? Clearly not. NVIDIA has developed a solution for their technology sweet spot, namely AI. However, the adoption of Arm in the data centre, lead by the public cloud service providers, clearly demonstrates that the dominance of x86 is over.

Intel Xeon third-generation processors are an incremental improvement from the second generation, including the introduction of PCIe 4.0. however, features like DL Boost won’t fix the problems of PCIe. AMD EPYC Milan processors still have the edge over Intel. The Intel ecosystem (Optane, networking, Xeon) isn’t enough to stop competitors chipping away at specific use-cases that will, in the future, turn out to be the majority of computing in the data centre.

The Architect’s View™

The availability of Grace is some way off (almost two years), giving Intel time to embed their ecosystem story with 3^rd generation Xeon Scalable CPUs. However, data centre architectures are changing as SmartNICs start the process of offloading core tasks from the general-purpose CPU. Could we be seeing the start of a battle between the centralised Intel architectural model (powerful CPU with persistent memory) and the disaggregated model (lightweight CPU, offloaded functionality)?

In the 1990s and 2000s, we became accustomed to a range of processor architectures that eventually standardised on x86. In the 2020s, we could be witnessing the start of the diversification of processors that once again puts specialisation back on the agenda. Intel’s strategy for the next decade is going to be watched more keenly than ever.