Google–NVIDIA AI factories
- Google and NVIDIA announced deeper collaboration on 'AI factories,' including Google Cloud A5X instances and Vera Rubin systems. - The collaboration highlights confidential NVIDIA Blackwell GPUs and claims scaling toward nearly one million Rubin GPUs. - The emphasis on hyperscale hardware grows the need for smarter orchestration, cost controls, and routing policies in enterprise platforms (blogs.nvidia.com).
Google and NVIDIA said this week they are expanding their cloud partnership around “AI factories,” with new Google Cloud systems built for larger AI training and inference clusters. (blogs.nvidia.com) At Google Cloud Next in Las Vegas, the companies announced A5X bare-metal instances built on NVIDIA’s Vera Rubin NVL72 rack-scale systems, plus previews of Gemini on Google Distributed Cloud running on NVIDIA Blackwell and Blackwell Ultra graphics processors. (blogs.nvidia.com) NVIDIA said A5X is designed to scale to 80,000 Rubin graphics processors in a single-site cluster and up to 960,000 Rubin graphics processors across multiple sites. Google and NVIDIA also said the system uses ConnectX-9 SuperNICs and Google’s Virgo networking. (blogs.nvidia.com) An “AI factory” is the industry’s term for data center infrastructure tuned to produce AI output the way a factory produces goods: chips, networking, software, and power all arranged to keep models training or answering prompts without idle time. Google has been pitching that stack as its AI Hypercomputer service since at least March, when it said agentic AI workloads need lower latency, higher throughput, and lower-cost inference. (cloud.google.com) The timing lines up with a broader shift inside Google Cloud toward customers running thousands of AI agents instead of a few chatbots. On April 22, Google Chief Executive Sundar Pichai said Google’s first-party models were processing more than 16 billion tokens per minute through customer API use, up from 10 billion the prior quarter. (blog.google) Google tied that demand to new management software as well as new hardware. At Cloud Next, Pichai said the company was introducing the Gemini Enterprise Agent Platform to help customers build, scale, govern, and optimize large fleets of agents. (blog.google) The hardware pitch is also about cost and energy. NVIDIA said A5X can deliver up to 10 times lower inference cost per token and 10 times higher token throughput per megawatt than the prior generation, though the company did not detail the benchmark setup in the announcement. (blogs.nvidia.com) Google is not betting only on NVIDIA chips. In the same Cloud Next rollout, Google said it was also introducing eighth-generation Tensor Processing Units, its in-house AI accelerators, while continuing to expand NVIDIA-based options across its cloud lineup. (blog.google; cloud.google.com) That leaves enterprises with a bigger systems problem than buying graphics processors alone: deciding which models run on which chips, when to use premium capacity, and how to keep inference bills from rising with every new agent. Google’s announcements this week paired the bigger clusters with new software for routing, governance, and security. (blog.google; blog.google) For now, the clearest signal from Las Vegas is that Google and NVIDIA are selling AI infrastructure less as a single server and more as a full production line. The scale they are advertising — from Blackwell systems now to Rubin clusters later — puts the bottleneck on orchestration as much as on chips. (blogs.nvidia.com; cloud.google.com)