NVIDIA doubles down on inference
Reporting from GTC coverage shows NVIDIA is pushing deeper into inference and heterogeneous strategies to counter custom‑silicon rivals — product announcements and messaging at GTC were clearly aimed at protecting inference market share. The move frames inference as a battleground for software and ecosystem lock‑in. ( )
Jensen Huang told GTC attendees he sees an “inference inflection” and forecasted roughly $1 trillion of AI infrastructure demand through 2027 driven by continuous, agentic inference workloads. (datacenterknowledge.com) NVIDIA expanded the Vera Rubin platform into a multi‑rack POD that now includes seven chip types and explicitly adds a Groq‑based LPU rack alongside Rubin GPU racks, Vera CPU racks and BlueField‑4 storage racks. (crn.com) The new Groq 3 “LPU” — the inference accelerator NVIDIA licensed from Groq — will be produced by Samsung Foundry with shipments slated for the second half of 2026 (NVIDIA indicated Q3 timing on stage). (en.sedaily.com) NVIDIA’s December deal to bring Groq technology and talent into its stack has been reported at about $20 billion, a transaction sources and coverage say was structured as a large licensing/asset deal rather than a plain acquisition. (bloomberg.com) On the software side NVIDIA unveiled NemoClaw — an enterprise wrapper for OpenClaw that pairs Nemotron models with a new OpenShell runtime for policy and sandboxing — and said NemoClaw installs in a single command. (nvidianews.nvidia.com) At the same time NVIDIA launched an Agent Toolkit and named 17 early adopters including Adobe, Salesforce and SAP to accelerate agent deployments and model routing inside its stack. (venturebeat.com) NVIDIA presented performance claims that the Groq 3 LPX rack can deliver up to ~35× inference throughput per megawatt versus prior generations and described a 256‑LPU rack topology plus DSX reference designs for “AI factory” deployments. (constellationr.com) NVIDIA said the Groq 3 LPX and other Vera Rubin‑based racks will begin customer availability in the second half of 2026, and cloud partners including AWS and Google Cloud are already positioning NVIDIA‑based inference offerings tied to those systems. (crn.com)