Nvidia doubles down on inference with Vera Rubin

Published March 17, 2026 by The Daily Scout

Nvidia used GTC to push a platformized inference strategy — CEO Jensen Huang framed Blackwell and the new Vera Rubin platform as a massive orders pipeline and an inference-first play reported. The announcement signals hyperscalers will keep driving large inference procurements and platform consolidation into 2026.

Why it matters

Jensen Huang projected at least $1 trillion in Blackwell and Vera Rubin orders through 2027, a figure he stated from the GTC keynote that followed an earlier $500 billion demand sighting through 2026. (techcrunch.com) NVIDIA described Vera Rubin as a seven‑chip, rack‑scale platform now in full production that combines the Vera CPU, Rubin GPU, NVLink‑6 switch, ConnectX‑9 SuperNIC, BlueField‑4 DPU, Spectrum‑6 Ethernet, and the newly integrated Groq 3 LPU. (nvidianews.nvidia.com) NVIDIA and reporters are explicit about performance targets: the company and coverage cite up to 10× more inference throughput per watt and roughly 5× training compute versus Blackwell, plus claims of about one‑tenth the cost per token on some workloads. (venturebeat.com) The flagship NVL72 rack was detailed as integrating 72 Rubin GPUs and 36 Vera CPUs linked by NVLink‑6, and NVIDIA says that for mixture‑of‑experts training Rubin can match Blackwell using one‑quarter the GPUs. (venturebeat.com) NVIDIA named an extraordinary customer list pitching the platform to hyperscalers and AI labs — OpenAI, Anthropic, Meta, Mistral and “every major cloud provider” including AWS, Google Cloud, Microsoft Azure and Oracle — and published endorsements from OpenAI’s Sam Altman and Anthropic’s Dario Amodei at GTC. (venturebeat.com) NVIDIA framed Vera Rubin as the start of a POD‑scale buildout supported by more than 80 MGX ecosystem and manufacturing partners, and multiple outlets report availability and production ramps targeted for mid‑ to second‑half 2026. (nvidianews.nvidia.com)

Key numbers

The announcement signals hyperscalers will keep driving large inference procurements and platform consolidation into 2026.
Jensen Huang projected at least $1 trillion in Blackwell and Vera Rubin orders through 2027, a figure he stated from the GTC keynote that followed an earlier $500 billion demand sighting through 2026.
(techcrunch.com) NVIDIA described Vera Rubin as a seven‑chip, rack‑scale platform now in full production that combines the Vera CPU, Rubin GPU, NVLink‑6 switch, ConnectX‑9 SuperNIC, BlueField‑4 DPU, Spectrum‑6 Ethernet, and the newly integrated Groq 3 LPU.
(venturebeat.com) The flagship NVL72 rack was detailed as integrating 72 Rubin GPUs and 36 Vera CPUs linked by NVLink‑6, and NVIDIA says that for mixture‑of‑experts training Rubin can match Blackwell using one‑quarter the GPUs.

What happens next

The announcement signals hyperscalers will keep driving large inference procurements and platform consolidation into 2026.

Sources

Quick answers

What happened in Nvidia doubles down on inference with Vera Rubin?

Nvidia used GTC to push a platformized inference strategy — CEO Jensen Huang framed Blackwell and the new Vera Rubin platform as a massive orders pipeline and an inference-first play reported. The announcement signals hyperscalers will keep driving large inference procurements and platform consolidation into 2026.

Why does Nvidia doubles down on inference with Vera Rubin matter?

Jensen Huang projected at least $1 trillion in Blackwell and Vera Rubin orders through 2027, a figure he stated from the GTC keynote that followed an earlier $500 billion demand sighting through 2026. (techcrunch.com) NVIDIA described Vera Rubin as a seven‑chip, rack‑scale platform now in full production that combines the Vera CPU, Rubin GPU, NVLink‑6 switch, ConnectX‑9 SuperNIC, BlueField‑4 DPU, Spectrum‑6 Ethernet, and the newly integrated Groq 3 LPU. (nvidianews.nvidia.com) NVIDIA and reporters are explicit about performance targets: the company and coverage cite up to 10× more inference throughput per watt and roughly 5× training compute versus Blackwell, plus claims of about one‑tenth the cost per token on some workloads. (venturebeat.com) The flagship NVL72 rack was detailed as integrating 72 Rubin GPUs and 36 Vera CPUs linked by NVLink‑6, and NVIDIA says that for mixture‑of‑experts training Rubin can match Blackwell using one‑quarter the GPUs. (venturebeat.com) NVIDIA named an extraordinary customer list pitching the platform to hyperscalers and AI labs — OpenAI, Anthropic, Meta, Mistral and “every major cloud provider” including AWS, Google Cloud, Microsoft Azure and Oracle — and published endorsements from OpenAI’s Sam Altman and Anthropic’s Dario Amodei at GTC. (venturebeat.com) NVIDIA framed Vera Rubin as the start of a POD‑scale buildout supported by more than 80 MGX ecosystem and manufacturing partners, and multiple outlets report availability and production ramps targeted for mid‑ to second‑half 2026. (nvidianews.nvidia.com)

Nvidia doubles down on inference with Vera Rubin

What happened

Why it matters

Key numbers

What happens next

Sources

Quick answers

What happened in Nvidia doubles down on inference with Vera Rubin?

Why does Nvidia doubles down on inference with Vera Rubin matter?

Get your own daily briefing