Google Splits TPU 8
- Google separated its eighth‑generation TPU family into distinct chips targeting training and inference workloads. - The company described TPU 8t for training and TPU 8i for inference as the two variant tracks. - Splitting training and inference silos changes procurement and cost math for production ML, pushing teams to pick hardware based on workload economics and latency needs ( ).
Google has split its eighth-generation Tensor Processing Unit line into two chips, one for training models and one for running them. (cloud.google.com) Google announced TPU 8t and TPU 8i at Cloud Next on April 22, 2026, and said customers can request more information now ahead of general availability later this year. TPU 8t is aimed at frontier-model training, while TPU 8i is built for large-scale inference and reinforcement learning. (blog.google) Training is the phase where a model learns from huge datasets; inference is the phase where the finished model answers prompts in production. Google said those jobs now have different infrastructure bottlenecks, so the company designed separate systems instead of one general-purpose TPU family. (cloud.google.com) That marks a change from April 2025, when Google introduced seventh-generation Ironwood as its first TPU designed specifically for inference. Ironwood later moved into broader training and inference roles on Google Cloud, but TPU 8 formalizes the split into two tracks at the product-family level. (blog.google, cloud.google.com) Google said TPU 8t scales to 9,600 chips and 2 petabytes of shared high-bandwidth memory in a single superpod. Chief Executive Sundar Pichai said the chip delivers three times the processing power of Ironwood and up to twice the performance per watt. (blog.google) Google said TPU 8i is tuned for fast responses, using a new “Boardfly” topology that directly connects 1,152 TPUs in one pod. The company also said 8i has three times more on-chip static random-access memory than prior versions and a Collectives Acceleration Engine to offload coordination work. (cloud.google.com) Both chips are hosted on Google’s Arm-based Axion central processing units, which the company said are meant to reduce delays from data preparation and orchestration on the host side. Google is packaging the systems as part of its AI Hypercomputer stack, which combines chips, networking, software, and data-center design. (cloud.google.com, cloud.google.com) For cloud buyers, the split turns a hardware decision into a workload decision: throughput and giant memory pools for training, or low-latency serving for live applications. Google’s own product page describes 8t as the training path and 8i as the inference path, rather than one chip family for both jobs. (cloud.google.com) Google’s pitch is that AI systems now spend money in two very different ways: long training runs and constant production serving. By separating TPU 8 into 8t and 8i, Google is selling those costs as two different infrastructure products instead of one. (cloud.google.com, blog.google)