Infrastructure becomes a moat
- Big players are doubling down on specialised AI infrastructure deals and new chip architectures. - Thinking Machines Lab signed a multibillion-dollar agreement with Google Cloud using Nvidia GB300-powered infrastructure, while Google unveiled an eighth-gen TPU splitting training and inference. - Those moves aim to slash inference costs and scale millions of agents, shifting value toward cloud and chip ecosystems. (techcrunch.com) (benzinga.com)
Google and Nvidia are turning AI compute into a long-term supply contract, not a spot purchase. On April 22, Google Cloud signed a multibillion-dollar infrastructure deal with Thinking Machines Lab and unveiled a new split-chip TPU strategy the same day. (techcrunch.com) (blog.google) Thinking Machines Lab, founded by former OpenAI chief technology officer Mira Murati, said the agreement expands its Google Cloud footprint for research, platform development, and frontier-model training. Google said the startup will use A4X Max virtual machines with Nvidia GB300 NVL72 systems and was among the first Google Cloud customers for that hardware. (techcrunch.com) (googlecloudpresscorner.com) A4X Max gives each virtual machine four Blackwell Ultra graphics processing units, and Google said early testing at Thinking Machines doubled training and serving speed versus the prior generation. The companies announced the deal at Cloud Next in Las Vegas on April 22. (siliconangle.com) (googlecloudpresscorner.com) Training is the expensive phase where a model learns from huge datasets; inference is the cheaper-but-constant phase where the model answers prompts after deployment. Google said those workloads have diverged enough that its eighth-generation Tensor Processing Units now come as two systems: TPU 8t for training and TPU 8i for inference and reinforcement learning. (cloud.google.com) Google said TPU 8t can scale to 9,600 chips and 2 petabytes of shared high-bandwidth memory in one superpod, while TPU 8i is built for large-scale, near-zero-latency serving. In its Cloud Next keynote materials, Google tied both chips to “agentic” workloads, where software systems make repeated model calls instead of answering one prompt at a time. (cloud.google.com 1) (cloud.google.com 2) Google also used the event to show how much of its business now depends on keeping customers inside its own stack. Sundar Pichai said on April 22 that Google’s first-party models are processing more than 16 billion tokens per minute through direct customer API use, and that just over half of Google’s 2026 machine-learning compute investment is expected to go to Cloud. (blog.google) That puts the Thinking Machines deal in a wider race over who captures the economics after models are trained. If training remains concentrated among a few labs and inference spreads across millions of agents and apps, the cloud provider with reserved capacity, networking, software, and custom chips can lock in more of the spending. (cloud.google.com) (blog.google) Google is not abandoning Nvidia in that shift. The company is selling Nvidia GB300 systems to outside customers through Google Cloud while also pushing in-house TPUs for workloads where it controls more of the design, pricing, and supply chain. (googlecloudpresscorner.com) (blog.google) The immediate result is that AI infrastructure now looks less like rented servers and more like a bundled platform deal. The companies with the chips, the cloud contracts, and the power budget are positioning themselves to decide who gets compute first — and at what price. (techcrunch.com) (blog.google)