Google details TPU 8t/8i split

- Google used its April 22 Cloud Next 2026 event to unveil eighth-generation Tensor Processing Units split into TPU 8t for training and TPU 8i for inference. - Google said TPU 8t scales to 9,600-chip superpods, while TPU 8i is tuned for low-latency serving with higher performance per dollar. - The split mirrors a wider data-center move toward separate training and inference silicon. (cloud.google.com)

Google used Cloud Next 2026 on April 22 to introduce two eighth-generation Tensor Processing Units instead of one: TPU 8t for training and TPU 8i for inference. (cloud.google.com) (blog.google) A Tensor Processing Unit is Google’s in-house AI chip, and the split reflects two different jobs inside modern AI systems. Training is the long, expensive process of teaching a model; inference is the fast response step when that model answers a prompt. (cloud.google.com 1) (cloud.google.com 2) Google said those jobs now pull hardware in different directions. Training needs huge compute throughput and scale-up bandwidth, while real-time inference needs more memory bandwidth and lower latency for long-context, multi-step agent workloads. (cloud.google.com 1) (cloud.google.com 2) TPU 8t is the training side of that plan. Google said it keeps the company’s 3D torus network design and scales a single superpod to 9,600 chips for frontier-model pre-training and embedding-heavy workloads. (cloud.google.com) Google also said TPU 8t is paired with Axion, its Arm-based host processor, to reduce data-preparation bottlenecks that can leave accelerators waiting idle. The company described Axion integration as a first for its eighth-generation TPU systems. (cloud.google.com 1) (cloud.google.com 2) Google’s public pitch is that specialization improves economics as well as speed. The company said the new generation is designed to deliver better performance and energy efficiency than prior TPU systems, with availability later in 2026. (blog.google) That puts Google closer to a model already visible elsewhere in AI infrastructure. Amazon Web Services has long separated Trainium and Inferentia, and Nvidia has been tuning newer systems more aggressively around inference-heavy demand. (theregister.com) Google tied the new chips to its broader “agentic” push at Next, where it also announced new networking, storage and enterprise AI products. In keynote remarks, Chief Executive Thomas Kurian said Google Cloud customers now process more than 16 billion tokens per minute through direct API use. (cloud.google.com) The company said customers can request more information now ahead of general availability later this year. The message from the launch is straightforward: Google no longer wants one TPU generation to do every AI job. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.