Google doubles down on infra

- Google signed a multibillion-dollar deal with Thinking Machines Lab to run AI infrastructure using Nvidia's GB300 chips. - It also unveiled an eighth-generation TPU architecture that splits chip roles between training and inference to cut costs. - Those moves push both enterprise software and compute efficiency to support large-scale agent deployments (techcrunch.com) (benzinga.com).

Google is buying more of the AI stack at once: Nvidia chips for customers now, and new in-house chips to lower the cost of running models later. (techcrunch.com) On April 22, Google Cloud signed a new agreement with Thinking Machines Lab, the startup founded by former OpenAI chief technology officer Mira Murati. TechCrunch reported the contract is valued in the single-digit billions and includes Google systems built on Nvidia’s GB300 chips. (techcrunch.com) Google said Thinking Machines will use A4X Max virtual machines with Nvidia GB300 NVL72 hardware, plus Google Kubernetes Engine, Spanner, Cloud Storage and Cluster Director. In Google’s early testing, the startup saw training and serving speeds double versus prior-generation graphics processing units. (googlecloudpresscorner.com) Artificial intelligence training is the stage where a model learns from huge datasets; inference is the stage where it answers prompts after deployment. Google’s new eighth-generation Tensor Processing Units split those jobs into two chips, TPU 8t for training and TPU 8i for inference. (techcrunch.com) Google said TPU 8t can train models up to three times faster than the prior generation, while the new design delivers 80% better performance per dollar and can link more than 1 million TPUs in one cluster. The company introduced the chips at Cloud Next on April 22 in Las Vegas. (techcrunch.com) Google’s own explanation is that AI workloads have split apart: pre-training, post-training and real-time serving now stress hardware in different ways. Its cloud team said the new systems were built for long context windows, reinforcement learning and “millions of agents” running multi-step tasks. (cloud.google.com) That helps explain why Google is still buying from Nvidia while pushing its own silicon. TechCrunch reported Google is offering Nvidia-based systems alongside its TPUs, and said Google plans to make Nvidia’s Vera Rubin chips available in its cloud later in 2026. (techcrunch.com) Thinking Machines is one of the first Google Cloud customers to get access to the GB300 systems, according to Google. The deal is not exclusive, TechCrunch reported, which leaves the startup free to use other cloud providers as it builds out its models and products. (googlecloudpresscorner.com) (techcrunch.com) Murati founded Thinking Machines in February 2025 after leaving OpenAI, and the company later raised a $2 billion seed round at a $12 billion valuation, according to TechCrunch. Its first product, Tinker, automates the creation of custom frontier models and relies on reinforcement learning, a method that consumes large amounts of computing power. (techcrunch.com) Google’s message this week was less about replacing Nvidia than about controlling where the money goes as AI use grows. It wants the expensive training contracts today and the lower-cost inference work tomorrow, on infrastructure it increasingly designs itself. (techcrunch.com) (cloud.google.com)

Google doubles down on infra

Get your own daily briefing