Google bets on AI infrastructure

- At Cloud Next, Google unveiled new AI chips and infrastructure plans aimed at supporting agentic workloads. - Google introduced eighth-generation TPUs and expanded access to Nvidia GB300 GPUs via a multibillion-dollar Thinking Machines deal. - The strategy pairs Google's bespoke silicon with Nvidia partnerships to attract enterprise AI workloads and reduce inference costs. ( )

Google used its Cloud Next event on April 22 to pitch itself as a place to run big artificial intelligence systems, with new in-house chips and more Nvidia capacity. (blog.google) Google said its first-party models now process more than 16 billion tokens per minute through direct customer application programming interface use, up from 10 billion last quarter. Chief Executive Sundar Pichai also said just over half of Google’s machine-learning compute investment in 2026 is expected to go to the cloud business. (blog.google) At the center of the launch was Google’s eighth-generation Tensor Processing Unit, or TPU, its custom artificial intelligence chip. Google said the new line splits into TPU 8t for training frontier models and TPU 8i for high-volume inference, the step where a trained model answers live requests. (blog.google) Google said those chips were designed for what it calls “agentic” workloads, meaning software that chains together tasks, tools and model calls with less human prompting. The company framed that as a new demand pattern for cloud customers building systems that must reason, act and respond continuously. (cloud.google.com) The pitch is partly about cost. Google said the infrastructure needs for training, post-training and real-time serving have diverged, and it built separate TPU systems because large-scale inference and reinforcement learning now need different hardware trade-offs than pre-training. (cloud.google.com) Google is not betting only on its own silicon. TechCrunch reported on April 22 that Mira Murati’s Thinking Machines Lab signed a new multibillion-dollar agreement with Google Cloud that includes systems powered by Nvidia’s latest GB300 graphics processing units. (techcrunch.com) That deal extends a broader pattern. TechCrunch reported in March that Thinking Machines Lab had already signed a multi-year Nvidia compute agreement involving at least a gigawatt of capacity, underscoring how large model builders are locking up power and chips years in advance. (techcrunch.com) Google has been building that Nvidia relationship in public for months. At Nvidia’s GTC conference in March, Google Cloud said it would add support for Nvidia Vera Rubin NVL72 systems and expand software integrations for inference and training on its platform. (cloud.google.com) The result is a two-track strategy: Google can sell customers its own TPUs when custom silicon fits the job, and rent them Nvidia systems when customers want the market’s dominant graphics chips. At Cloud Next, Google presented that mix as part of its “AI Hypercomputer,” its term for bundling chips, networking and software into one cloud stack. (cloud.google.com) For enterprise buyers, the message was less about one chip launch than about supply. Google is telling customers that if they want to train models, fine-tune them, and serve them at scale without getting trapped by one hardware path, it wants to be the cloud that can offer both. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.