NVIDIA’s inference pivot
Jensen Huang framed an “inference inflection” at GTC — AI is shifting from one‑off training to continuous, real‑world inference that creates recurring revenue streams ( ). He put $1 trillion of revenue visibility on NVIDIA’s Blackwell/Rubin architectures for 2025–2027 even as the stock closed lower after the keynote, and analysts keep pointing to CUDA as the persistent switching‑cost moat ( ).
NVIDIA showed the Vera Rubin superchip in production form — reviewers and teardown reports list Rubin GPUs with up to 288 GB of HBM4, a multi‑die design totaling ~336 billion transistors, and an 88‑core Vera CPU paired for large‑context inference workloads (tomshardware.com)). NVIDIA’s public roadmap now sequences Rubin/Rubin Ultra through 2026–2027 and a next‑generation architecture codenamed Feynman targeted for 2028, compressing its generational cadence and signalling multi‑year platform stickiness to hyperscalers. (oplexa.com)) The company also pushed an inference software layer at GTC: Dynamo 1.0 as an orchestration/serving OS for inference and NemoClaw/Nemotron plus OpenShell as an enterprise agent stack to install and secure always‑on agent deployments. (fierce-network.com)) NVIDIA expanded its autonomous‑vehicle ties onstage, announcing a joint plan with Uber to roll NVIDIA‑powered Level‑4 robotaxis into 28 cities by 2028 with initial LA/SF launches slated for the first half of 2027. (investor.uber.com)) Market data show NVDA closed the trading session on March 16, 2026 at $183.22, a gain of about 1.65% on that day, not a post‑keynote sell‑off. (seekingalpha.com)) Wall Street reaction mixed into nuance: Wedbush’s Dan Ives dubbed the keynote a “confidence boost” for long‑term demand while some broker notes praised the extended roadmap yet left price targets largely unchanged — even as rivals and hyperscalers push efforts (TorchTPU/PyTorch work and TPU/ASIC rollouts) intended to lower the developer switching costs that underpin NVIDIA’s ecosystem advantage. (thestreet.com))