NVIDIA pushes new inference chips at GTC

Published March 23, 2026 by The Daily Scout

At GTC 2026 NVIDIA revealed inference‑focused AI chipsets that incorporate Groq’s LPU technology as it races to blunt custom ASIC rivals — the company is also broadening support for automakers and robotics customers. (digitimes.com) (techbuzz.ai)

Why it matters

The Groq 3 LPU die ships with roughly 500 MB of on‑chip SRAM and is billed at about 150 TB/s of internal memory bandwidth per chip. (tomshardware.com — ) NVIDIA’s LPX rack configuration packs 256 LPUs for a stated 128 GB of aggregate on‑chip SRAM and a claimed 640 TB/s scale‑up bandwidth per rack. (nvidia.com — ) Samsung Foundry is the manufacturing partner for Groq 3, producing the LPU on a 4 nm process as NVIDIA confirmed a mass‑production ramp and penciled first shipments for Q3 2026. (koreajoongangdaily.joins.com — ) NVIDIA frames the LPUs as decode‑phase co‑processors inside the Vera Rubin NVL72 rack, pairing Rubin GPUs for training/reasoning with LPUs to handle low‑latency, large‑context inference workloads. (storagereview.com — ) Rubin GPUs tied to the Vera Rubin platform use HBM4 stacks delivering about 22 TB/s of bandwidth per GPU, while Samsung showcased its new HBM4/HBM4E memory at GTC to support that demand. (tomshardware.com — news.samsung.com — ) NVIDIA’s own technical materials claim the LPX+Rubin combination can yield up to 35× higher inference throughput per megawatt and about 10× lower token cost versus prior Blackwell systems. (developer.nvidia.com — ) GTC announcements expanded DRIVE Hyperion Level‑4 partnerships to BYD, Geely, Isuzu and Nissan and outlined a full‑stack robotaxi rollout with Uber across 28 cities by 2028, alongside new robotics integrations with ABB, FANUC and other industrial players. (investor.nvidia.com — nvidianews.nvidia.com — ) Separately, OpenAI has told investors it is tempering infrastructure commitments and has scaled back an ambitious NVIDIA agreement as it prepares for an IPO, a move cited by market analysts as moderating near‑term hyperscaler GPU demand. (cnbc.com — )

Key numbers

At GTC 2026 NVIDIA revealed inference‑focused AI chipsets that incorporate Groq’s LPU technology as it races to blunt custom ASIC rivals — the company is also broadening support for automakers and robotics customers.
(digitimes.com) (techbuzz.ai) The Groq 3 LPU die ships with roughly 500 MB of on‑chip SRAM and is billed at about 150 TB/s of internal memory bandwidth per chip.
(tomshardware.com — ) NVIDIA’s LPX rack configuration packs 256 LPUs for a stated 128 GB of aggregate on‑chip SRAM and a claimed 640 TB/s scale‑up bandwidth per rack.
(nvidia.com — ) Samsung Foundry is the manufacturing partner for Groq 3, producing the LPU on a 4 nm process as NVIDIA confirmed a mass‑production ramp and penciled first shipments for Q3 2026.

Sources

Quick answers

What happened in NVIDIA pushes new inference chips at GTC?

At GTC 2026 NVIDIA revealed inference‑focused AI chipsets that incorporate Groq’s LPU technology as it races to blunt custom ASIC rivals — the company is also broadening support for automakers and robotics customers. (digitimes.com) (techbuzz.ai)

Why does NVIDIA pushes new inference chips at GTC matter?

The Groq 3 LPU die ships with roughly 500 MB of on‑chip SRAM and is billed at about 150 TB/s of internal memory bandwidth per chip. (tomshardware.com — ) NVIDIA’s LPX rack configuration packs 256 LPUs for a stated 128 GB of aggregate on‑chip SRAM and a claimed 640 TB/s scale‑up bandwidth per rack. (nvidia.com — ) Samsung Foundry is the manufacturing partner for Groq 3, producing the LPU on a 4 nm process as NVIDIA confirmed a mass‑production ramp and penciled first shipments for Q3 2026. (koreajoongangdaily.joins.com — ) NVIDIA frames the LPUs as decode‑phase co‑processors inside the Vera Rubin NVL72 rack, pairing Rubin GPUs for training/reasoning with LPUs to handle low‑latency, large‑context inference workloads. (storagereview.com — ) Rubin GPUs tied to the Vera Rubin platform use HBM4 stacks delivering about 22 TB/s of bandwidth per GPU, while Samsung showcased its new HBM4/HBM4E memory at GTC to support that demand. (tomshardware.com — news.samsung.com — ) NVIDIA’s own technical materials claim the LPX+Rubin combination can yield up to 35× higher inference throughput per megawatt and about 10× lower token cost versus prior Blackwell systems. (developer.nvidia.com — ) GTC announcements expanded DRIVE Hyperion Level‑4 partnerships to BYD, Geely, Isuzu and Nissan and outlined a full‑stack robotaxi rollout with Uber across 28 cities by 2028, alongside new robotics integrations with ABB, FANUC and other industrial players. (investor.nvidia.com — nvidianews.nvidia.com — ) Separately, OpenAI has told investors it is tempering infrastructure commitments and has scaled back an ambitious NVIDIA agreement as it prepares for an IPO, a move cited by market analysts as moderating near‑term hyperscaler GPU demand. (cnbc.com — )

NVIDIA pushes new inference chips at GTC

What happened

Why it matters

Key numbers

Sources

Quick answers

What happened in NVIDIA pushes new inference chips at GTC?

Why does NVIDIA pushes new inference chips at GTC matter?

Get your own daily briefing