Post‑GTC: inference, agents, robots
Post‑GTC discussion this week shifted attention to inference economics, agentic AI and robotics — the conference framing leaned into 'AI factories' and cost‑per‑token metrics rather than only raw model scale ( ). Hardware moves tied to that message include HPE adding Vera Rubin NVL72/Cray systems into its NVIDIA portfolio, reflecting vendor bundling at the high end (x.com).
Nvidia’s message after GTC shifted from training ever-bigger models to running them cheaply at scale, with “AI factories,” agents and robots at the center. (nvidia.com) In AI, training is the expensive schooling phase; inference is the live work of answering prompts, generating code or making decisions one request at a time. Nvidia said cost per token — the price of producing chunks of model output — is now the metric that best captures real-world economics. (blogs.nvidia.com) Nvidia’s GTC 2026 keynote on March 16 in San Jose framed that shift across “accelerated computing and AI factories,” “agentic systems,” and “physical AI.” Its event recap and keynote page both put inference and robotics alongside chips and data centers, not as side topics. (nvidia.com, nvidia.com) The company’s pitch is that an AI data center should be judged less like a lab and more like a factory: how many useful tokens it produces, how much power it burns, and how steadily it runs. Nvidia’s recent posts tied token cost to hardware, software, utilization and performance per watt, not just peak chip specs. (developer.nvidia.com, developer.nvidia.com) That framing lines up with where demand has moved over the last year. Once a model is trained, serving millions of user queries, tool calls and longer reasoning chains can dominate spending, especially when companies add test-time computation and multi-step agents. (investor.nvidia.com, blogs.nvidia.com) Agentic AI is Nvidia’s label for systems that do more than answer once: they plan, call tools, retrieve data and take actions across several steps. GTC’s agentic AI track described them as “digital workforces” built to reason, plan and act autonomously on enterprise data. (nvidia.com) Physical AI is the same idea pushed into machines that sense and move in the real world. Nvidia’s GTC materials and newsroom updates grouped robotics, vision AI agents and autonomous vehicles under that banner, alongside an “Open Physical AI Data Factory Blueprint.” (nvidianews.nvidia.com, nvidia.com) The hardware announcements matched the message. Nvidia said its Vera Rubin platform was built for “every phase of AI,” from pretraining and post-training to “test-time scaling” and “agentic inference,” and packaged that into Vera Rubin NVL72 GPU racks and related networking and storage systems. (nvidianews.nvidia.com) Hewlett Packard Enterprise moved quickly to wrap those parts into its own high-end offerings. On March 16, HPE said its AI Factory portfolio now includes Nvidia Vera Rubin and Blackwell platforms, and its rack-scale systems page said “NVIDIA Vera Rubin NVL72 by HPE” is aimed at frontier models with more than 1 trillion parameters. (hpe.com, hpe.com) HPE also said it is adding Vera CPU compute blades and new Cray Supercomputing GX5000 configurations with Nvidia networking, software and services. That kind of bundling turns Nvidia’s chip roadmap into packaged systems for cloud providers, sovereign AI projects, research labs and large enterprises. (hpe.com, hpe.com) Nvidia has supported the economics argument with selective customer examples. In February, it said Baseten, DeepInfra, Fireworks AI and Together AI were cutting cost per token by up to 10x on Blackwell with optimized open-source model stacks, a vendor claim that points to where competition is moving even if rivals will dispute the comparisons. (blogs.nvidia.com) The post-GTC takeaway is less about one chip than about the sales model around it: factories instead of boxes, token output instead of raw peak speed, and robots as another endpoint for the same infrastructure. Nvidia and its server partners are now selling that full stack as the next phase of the AI buildout. (nvidia.com, hpe.com)