Google unveils AI chips
- Google announced new chips designed for both AI training and inference as an explicit challenge to Nvidia. - The inference hardware emphasises large amounts of SRAM to optimise memory-heavy inference workloads. - Hyperscalers pushing custom silicon shifts competition from pure training throughput toward inference efficiency, memory and power trade-offs. (cnbc.com)
Google split its next artificial intelligence chip into two products, one for training models and one for running them, as it presses a more direct challenge to Nvidia. (blog.google) The announcement came April 22 at Google Cloud Next in Las Vegas, where Google said both eighth-generation Tensor Processing Units, called TPU 8t and TPU 8i, will arrive later in 2026. CNBC reported the move as Google’s first clear break from using one TPU generation for both jobs. (blog.google) (cnbc.com) Training is the expensive step where a model learns from huge data sets; inference is the cheaper-looking but often larger task of answering millions of user requests after the model is built. Google said TPU 8t is tuned for training with one large shared memory pool, while TPU 8i is tuned for serving models at high concurrency. (blog.google) Memory is the bottleneck in many inference jobs because large language models must keep huge amounts of model data close to the chip to answer quickly. CNBC reported Google loaded TPU 8i with large amounts of static random access memory, or SRAM, the faster on-chip memory also being emphasized in Nvidia’s upcoming designs. (cnbc.com) Google said TPU 8i delivers 80% higher performance per dollar than Ironwood for large language model inference, while TPU 8t provides up to 2.8 times faster training than Ironwood. Those claims matter because cloud providers now compete not only on peak speed, but on how cheaply they can keep models responding all day. (blog.google) (theregister.com) That is a shift from the earlier phase of the generative artificial intelligence boom, when the headline race centered on building ever larger models. Amazon Web Services already split that work across Inferentia for inference and Trainium for training, and Microsoft introduced a second-generation Maia chip in January. (cnbc.com) Google has been building its own AI chips for more than a decade, starting internal TPU use in 2015 and renting TPUs to cloud customers beginning in 2018. Last year it introduced Ironwood, its seventh-generation TPU and the first one Google described as designed specifically for inference. (cnbc.com) (blog.google) Google is not dropping Nvidia from its cloud. TechCrunch reported Google also said it will offer Nvidia’s Vera Rubin systems later this year, which leaves customers choosing between Google’s in-house silicon and Nvidia hardware on the same cloud. (techcrunch.com) The result is a more crowded contest over the part of artificial intelligence that users actually touch: the cost, speed, and power draw of every prompt, response, and agent action. Google’s bet is that a chip built for that workload will sell better than one trying to do everything at once. (cnbc.com)