Nvidia folds Groq LPU into inference push

At GTC 2026 Nvidia leaned heavily into inference, folding Groq’s Language Processing Unit tech into its stack to close the gap with custom ASIC rivals — a clear signal that low‑latency on‑device inference is the next battleground. Apple’s Neural Engine and Core ML stacks will need regular benchmarking against these inference improvements. (digitimes.com)

Nvidia closed a roughly $20 billion deal for Groq’s assets and team that sources say was finalized on December 24, 2025. (tomshardware.com)) Nvidia unveiled the Groq 3 Language Processing Unit (LPU) as part of its GTC 2026 keynote on March 16, 2026, billing the chip as the first product built from the Groq agreement. (spectrum.ieee.org)) Nvidia’s developer blog and partner coverage describe the Groq 3 LPX rack as a 256‑LPU, rack‑scale inference accelerator with about 128 GB of aggregate on‑chip SRAM and roughly 640 TB/s of scale‑up bandwidth. (developer.nvidia.com)) Nvidia and independent coverage report each Groq 3 LPU die contains on‑chip SRAM in the hundreds of megabytes (published estimates around 500 MB per chip) and internal bandwidth figures cited near 150 TB/s, trading raw capacity for ultra‑low‑latency memory access. (thelec.net)) Nvidia positioned the LPX racks to pair with Vera Rubin NVL72 GPU racks via its Spectrum‑X interconnect, with LPUs acting as decode‑phase accelerators that can offload token‑generation workloads on a per‑token basis. (developer.nvidia.com)) Samsung Foundry was named as Groq 3’s manufacturer using a 4‑nanometer process, with Nvidia indicating volume production and customer shipments targeted in the second half of 2026 (Q3 noted in multiple briefs). (koreajoongangdaily.joins.com))

Nvidia folds Groq LPU into inference push

Get your own daily briefing