Cloud GPU Scale

- Google Cloud and Nvidia are expanding enterprise access to Blackwell GPUs and Rubin-powered instances for large-scale AI workloads. - Announcements highlight Vera Rubin A5X instances and claims of scaling toward nearly one million Rubin GPUs for AI factories. - The move signals continued infrastructure arms race as cloud vendors court customers building agentic and physical AI systems (blogs.nvidia.com) (letsdatascience.com).

Google Cloud and Nvidia said this week that customers will be able to use new A5X bare-metal systems built on Nvidia’s Vera Rubin chips. (blogs.nvidia.com) The announcement came at Google Cloud Next in Las Vegas on April 22, 2026, alongside a broader push to expand Google Cloud’s AI Hypercomputer lineup. Nvidia said A5X clusters can scale to 80,000 Rubin graphics processors at one site and 960,000 across multiple sites. (blogs.nvidia.com) A graphics processor, or GPU, is the chip most companies rent to train and run large artificial intelligence models because it can handle many calculations at once. Google and Nvidia said the new A5X design combines Rubin NVL72 rack systems with Nvidia ConnectX-9 networking and Google’s Virgo network fabric. (blogs.nvidia.com) Google and Nvidia are pairing those systems with software and cloud services aimed at companies building “agentic” tools that can carry out multistep tasks and “physical AI” systems used in robots and factory simulations. The companies also said Gemini will be previewed on Google Distributed Cloud with Nvidia Blackwell and Blackwell Ultra chips, and that confidential virtual machines with Blackwell GPUs are coming. (blogs.nvidia.com) Google has been widening its AI infrastructure menu for months rather than betting on one chip family. At Nvidia’s GTC conference on March 16, Google said it would add support for Nvidia Vera Rubin NVL72 while also promoting Blackwell-based G4 systems and its own AI Hypercomputer stack. (cloud.google.com) At Next, Google also used the event to argue that enterprise demand has moved from experimentation to production. Chief Executive Thomas Kurian said on April 22 that nearly 75% of Google Cloud customers use its AI products, that 330 customers processed more than 1 trillion tokens over the past 12 months, and that 35 reached 10 trillion tokens. (cloud.google.com) That scale helps explain why cloud providers are now talking about whole racks and whole sites instead of single servers. A Google Cloud Next session described A4X Max systems based on Nvidia GB300 NVL72 and said the future A5X line would use Rubin architecture, with engineering work focused on reliability, maintenance, and debugging at massive scale. (googlecloudevents.com) Nvidia framed the new systems as infrastructure for “AI factories,” its term for data centers tuned to produce tokens and model outputs at industrial scale. Google framed the same buildout as part of a vertically integrated cloud stack for companies that want to train, fine-tune, and serve models without assembling the hardware themselves. (blogs.nvidia.com)

Cloud GPU Scale

Get your own daily briefing