Nvidia + Google 'AI factories'
- Nvidia and Google Cloud announced collaboration to build “AI factories” tying GPUs to agentic and physical AI workloads. - Google’s A5X infrastructure claimed scale to 960,000 Rubin GPUs across data centers to boost inference throughput and efficiency. - The partnership signals stronger hybrid and sovereign deployment options for latency-sensitive physical AI in warehouses and robotics ( ).
Nvidia and Google Cloud said on April 22 they are expanding their partnership to build “AI factories” aimed at running agentic software and robots on Google’s cloud stack. (blogs.nvidia.com) In plain terms, an AI factory is a data center tuned to produce AI output the way a power plant produces electricity: chips do the work, networking moves the data, and software keeps the whole system fed. Google and Nvidia said their new A5X bare-metal instances will use Nvidia Vera Rubin systems and scale to 80,000 Rubin graphics processing units at one site and 960,000 across multiple sites. (blogs.nvidia.com) The companies said A5X is designed for inference, the stage when a trained model answers prompts or controls a machine in real time rather than learning from scratch. Nvidia said the setup can deliver up to 10 times lower inference cost per token and up to 10 times higher token throughput per megawatt than the prior generation. (blogs.nvidia.com) Google has been pitching that kind of infrastructure as the base layer for what it calls the “Agentic Enterprise,” with more customers moving from chatbots to software that can plan, call tools, and complete tasks. In a Google Cloud Next post published April 22, the company said more than 16 billion tokens per minute now run through its first-party models via direct customer application programming interface use. (cloud.google.com) The Nvidia tie-up also pushes beyond text bots into “physical AI,” shorthand for models that guide machines in the real world such as warehouse robots, factory systems, and digital twins. Nvidia said the updated stack is meant to move those systems “out of the lab and into production,” including robots and factory-floor simulations. (blogs.nvidia.com) A second piece of the announcement is where those models can run. Nvidia said Google is previewing Gemini on Google Distributed Cloud with Nvidia Blackwell and Blackwell Ultra graphics processing units, giving customers a way to deploy Google’s models in their own environments instead of only in Google-operated regions. (blogs.nvidia.com) That matters for companies that need lower delay, tighter data controls, or local processing near equipment. Google’s Confidential Virtual Machines are built to keep code and data encrypted in memory while they are being processed, and Nvidia said confidential virtual machines with Blackwell graphics processing units are part of this expanded offering. (docs.cloud.google.com, blogs.nvidia.com) The software layer is part of the deal too. Nvidia said customers will be able to build agents on Google’s Gemini Enterprise Agent Platform using Nvidia Nemotron open models and the Nvidia NeMo framework, tying Google’s cloud services to Nvidia’s model and tooling ecosystem. (blogs.nvidia.com) Google and Nvidia have worked together for more than a decade, and Google used its March 16 GTC post to say support for Nvidia Vera Rubin NVL72 was already on the roadmap. The April 22 announcement turns that roadmap into a much larger pitch: use Google’s cloud, Nvidia’s newest chips, and hybrid deployment options to run bigger AI systems closer to where decisions have to happen. (cloud.google.com, blogs.nvidia.com)