Google splits TPU lines
- Google announced it split its eighth‑generation TPUs into two chips, separating training and inference workloads. - The new lines are named TPU 8t for training and TPU 8i for inference, unveiled at Cloud Next. - This signals AI infrastructure moving toward separate central training and distributed inference architectures, influencing command‑centre compute layouts (blog.google).
Google said at Cloud Next on April 22 that its eighth-generation Tensor Processing Units will come as two separate chips, one for training artificial intelligence models and one for running them. (blog.google) A Tensor Processing Unit is Google’s custom chip for machine learning, sold through Google Cloud alongside Nvidia graphics processors. The new pair is called TPU 8t for training and TPU 8i for inference, the step when a trained model answers prompts or takes actions. (docs.cloud.google.com ) (blog.google) Google said TPU 8t is built for “frontier-model training” and TPU 8i is built for large-scale inference and reinforcement learning. The company said the split reflects a change in how customers build systems for pre-training, post-training and real-time serving. (cloud.google.com) Training is the expensive, centralized phase where a model learns from huge datasets; inference is the repeated work of serving answers after deployment. Google’s new design separates those jobs in hardware instead of asking one chip family to handle both. (docs.cloud.google.com) (cloud.google.com) Google said TPU 8t scales to 9,600 chips and 2 petabytes of shared high-bandwidth memory in one superpod. Sundar Pichai said it delivers three times the processing power of Ironwood, Google’s seventh-generation TPU, and up to twice the performance per watt. (blog.google) Ironwood, announced at Cloud Next 2025, was itself positioned as an inference-first chip rather than a general-purpose successor. Google said then that Ironwood offered five times the peak compute capacity and six times the high-bandwidth memory of the prior generation. (blog.google) The eighth-generation family is also the first Google has said will be hosted on its own Axion Arm-based central processors. Google said that pairing is part of a co-designed stack that combines chips, networking, software and data-center systems inside its AI Hypercomputer platform. (cloud.google.com 1) (cloud.google.com 2) Google tied the announcement to “agentic” artificial intelligence, its term for systems that reason across steps and call tools in real time. In the same keynote, Pichai said Google’s first-party models are now processing more than 16 billion tokens per minute through direct customer API use, up from 10 billion in the prior quarter. (blog.google) That usage pattern favors one kind of machine for giant training runs in a few places and another for fast responses at large scale across cloud regions. Google said customers can request more information now ahead of general availability later in 2026. (blog.google)