Cloud Infrastructure Push
- Nvidia and Google Cloud announced a collaboration to industrialise agentic AI and cut inference costs at scale. - Their roadmap includes Blackwell systems and future Vera Rubin‑powered instances claimed to scale toward nearly one million GPUs. - Yet a Cast AI report found average GPU utilisation near 5% in many Kubernetes clusters, flagging waste alongside scale. ( )
Nvidia and Google Cloud said on April 22 they are widening their cloud partnership to build larger systems for agentic artificial intelligence and lower the cost of serving models. (nvidia.com) The companies said customers will get Google Cloud support for Nvidia Blackwell systems, including confidential computing options, and future A5X instances based on Nvidia’s Vera Rubin platform. Nvidia said those A5X instances are designed to scale to nearly 1 million Rubin graphics processing units, or GPUs. (nvidia.com) Google Cloud said at Nvidia GTC 2026 that it is also adding support for the Vera Rubin NVL72 platform and integrating Nvidia Dynamo with Google Kubernetes Engine Inference Gateway for model serving. Google said its G4 virtual machines with Nvidia RTX PRO 6000 Blackwell Server Edition chips are already generally available. (cloud.google.com; cloud.google.com) Agentic artificial intelligence refers to software that can plan steps and use tools with less human prompting than a basic chatbot. Inference is the stage where a trained model answers a user request, and that is the part cloud providers are trying to make cheaper and faster at industrial scale. (cloud.google.com; nvidia.com) The push for bigger fleets comes as many companies still leave expensive accelerators idle. Cast AI said in a report released April 21 that average GPU utilization in non-optimized Kubernetes clusters was 5%, based on data from tens of thousands of clusters across Amazon Web Services, Google Cloud and Microsoft Azure. (cast.ai; cast.ai) Cast AI said the same dataset showed average central processing unit utilization at 8% and memory utilization at 20% in those clusters, with measurements taken before customers turned on Cast AI automation features. The company said its report covers January 1 through December 31, 2025, with GPU data updated through April 2026. (cast.ai; cast.ai) Nvidia and Google are pitching software as part of the answer, not just more hardware. Nvidia said the expanded partnership includes Gemini on Google Distributed Cloud, Nvidia Nemotron models, Nvidia NeMo tools and confidential Blackwell systems for companies that need to keep data on premises or under tighter controls. (nvidia.com; cloud.google.com) Google and Nvidia have been building toward this for more than a year. In April 2025, the companies said they were working to bring Gemini models on premises with Nvidia Blackwell infrastructure and to improve observability for agentic workloads with Nvidia Dynamo. (nvidia.com) The near-term contest is not only who can assemble the biggest GPU cluster, but who can keep those chips busy enough to justify the bill. The companies’ latest pitch pairs larger Blackwell and Rubin systems with software meant to raise utilization on the infrastructure customers already rent. (nvidia.com; cast.ai)