Google's TPU 8t hits 121 FP4 exaFLOPS

- Google said on April 22 that its new TPU 8t chip is built for training giant AI models, while TPU 8i is tuned for inference. - Google says a TPU 8t superpod links 9,600 chips, 2 petabytes of shared memory, and up to 121 exaflops of compute. - The split marks Google’s first separate training and inference TPU generation, aimed at larger cloud AI workloads. (cloud.google.com)

A Tensor Processing Unit is Google’s in-house AI chip, built to do the matrix math that trains models and answers prompts at cloud scale. On April 22, Google said its eighth generation will split into two products: TPU 8t for training and TPU 8i for inference. (cloud.google.com) (blog.google) Training is the phase where a model learns from huge datasets; inference is the phase where a trained model responds to users. Google said those jobs now stress hardware differently enough that it designed separate chips for each one. (cloud.google.com) (blog.google) For the training side, Google said TPU 8t scales to a 9,600-chip superpod with 2 petabytes of shared high-bandwidth memory. The company said that system reaches up to 121 exaflops of compute and uses a 3D torus network to keep thousands of chips synchronized. (cloud.google.com) (blog.google) Google said TPU 8t delivers three times the processing power of Ironwood and up to 2x more performance per watt, while its Cloud TPU page says TPU 8t offers up to 2.7x better performance per dollar than Ironwood for large-scale training. (blog.google) (cloud.google.com) For the inference side, Google said TPU 8i is built for lower-latency serving and reinforcement learning, with higher on-chip memory and a design aimed at large Mixture-of-Experts models. Google’s Cloud TPU page says TPU 8i provides an 80% performance-per-dollar improvement over prior generations for low-latency inference on large MoE models. (cloud.google.com 1) (cloud.google.com 2) Google tied the new chips to a broader “AI Hypercomputer” stack that includes custom networking, liquid cooling, storage, and Axion Arm-based host processors. The company said the Axion hosts reduce data-preparation bottlenecks so the TPUs spend more time on active computation. (cloud.google.com) The timing is part of a wider cloud competition, not a clean break from Nvidia. TechCrunch reported Google is still planning to offer Nvidia’s Vera Rubin systems in its cloud later this year, even as it pitches TPUs as a cheaper and more efficient path for some AI workloads. (techcrunch.com) Google said customers can request more information now and that TPU 8t and TPU 8i will be generally available later in 2026. The immediate message is that Google now wants to sell one chip for building frontier models and another for running them at scale. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.