Rubin GPUs upend inference math

At GTC buzz, Vera Rubin GPUs are being touted as 3–4x the compute density versus Blackwell and up to 10x cheaper for inference — shifting the CPU bottleneck conversation for agentic AI. NVIDIA’s Blackwell demand remains extreme (a reported 3.6M backlog) and GB300 racks are said to be generating up to 50x H100 revenue; AWS will also deploy a million Blackwell/Rubin GPUs from 2026, signaling massive cloud capacity moves. (x.com) (x.com) (x.com)

NVIDIA says it shipped the first Vera Rubin samples in late February and on March 16 announced the Rubin platform with seven new chips moving into full production at GTC 2026. (investor.nvidia.com) The Rubin R100 compute chip is described in vendor slides as a roughly 336‑billion‑transistor design with HBM4 memory and peak NVFP4 throughput quoted at about 50 petaflops per GPU. (letsdatascience.com) NVIDIA’s flagship Rubin rack, NVL72, is specified to house 72 Rubin GPUs and 36 Vera CPUs and to deliver roughly 3.6 exaflops of NVFP4 inference at rack scale alongside more than 20 TB of HBM4 and 54 TB of pooled LPDDR5X. (videocardz.com) The Vera CPU that pairs with Rubin is an 88‑core Arm design with large pooled memory capacity and native support for the platform’s new numeric formats and interconnects used to aggregate GPU memory and bandwidth. (letsdatascience.com) NVIDIA has contextualized the GB300/Blackwell family as a “revenue uplift” product for AI‑service operators, citing third‑party InferenceX results that it says show GB300 NVL72 yields orders‑of‑magnitude improvements in throughput and cost versus prior generations. (blogs.nvidia.com) Major cloud partners have begun public rollouts and procurement: CoreWeave posted the industry’s first GB300 NVL72 deployment, Microsoft and Oracle have said they are deploying Blackwell‑class racks, and AWS confirmed plans to add more than one million NVIDIA GPUs across regions beginning in 2026. (coreweave.com)

Rubin GPUs upend inference math

Get your own daily briefing