Google says TPU 8t delivers roughly 3× training performance; TPU 8i aims for ~3× inference gains
- Google used Cloud Next 2026 to unveil eighth-generation TPUs split by job: TPU 8t for training and TPU 8i for inference and reinforcement learning. - Google says TPU 8t reaches 3× Ironwood processing power and up to 2.7× training performance per dollar, while TPU 8i targets 80% better inference economics. - The bigger shift is strategic — Google is selling a more specialized, full-stack TPU system instead of one chip for everything.
Google’s TPU news is really about specialization. For years, the pitch was simple — build bigger accelerators, run more AI. But the workloads split apart. Training giant frontier models wants huge memory pools and tightly synchronized clusters. Inference wants low latency, cheaper tokens, and steady throughput at scale. At Cloud Next 2026, Google responded by breaking its eighth-generation TPU family into two products: TPU 8t for training and TPU 8i for inference. (cloud.google.com) ### Why split TPUs in two? Because “AI compute” is no longer one problem. Pre-training a large model, post-training it with reinforcement learning, and serving it to users all stress hardware differently. Google’s own framing is blunt here — the infrastructure requirements for pre-training, post-training, and real-time serving have diverged, so one(cloud.google.com)PU 8i is the serving-heavy side. (cloud.google.com) ### What is TPU 8t actually for? TPU 8t is the big-cluster machine. Google says it is optimized for frontier-model training and embedding-heavy workloads, and that a single superpod can scale to 9,600 chips with 2 petabytes of shared high-bandwidth memory. That matters because the hardest training runs are often bottlenecked less by one chip’s head(cloud.google.com)ata movement or failures. (cloud.google.com) ### What numbers is Google putting on it? The headline claim is 3× processing power versus Ironwood for TPU 8t. Google is also saying up to 2× more performance per watt and up to 2.7× better performance per dollar for large-scale training. Those are big numbers, but notice how they’re framed — system economics, not just raw per-chip bragging rights(cloud.google.com)celerator card in isolation. (cloud.google.com) ### So what is TPU 8i trying to fix? Inference has become its own monster. Once a model is trained, the expensive part is often serving tons of requests quickly and cheaply — especially for agentic systems that call tools, reason over multiple steps, and keep conversations going. Google says TPU 8i is designed for low-latency inference and reinforcement learning, with an 80% performance-per-dollar impro(cloud.google.com)nference on large mixture-of-experts models. Basically, 8i is the “make serving affordable” chip. (blog.google) ### Why keep talking about the whole system? Because Google’s advantage case is increasingly full-stack. The company pairs these TPUs with its Virgo network fabric, Axion Arm-based host CPUs, storage, scheduling software, and its AI Hypercomputer stack. That is the real pitch — not just “our chip is faster,” but “our data cente(blog.google)cycles that go to active model work instead of stalls, resets, and data waits. (cloud.google.com) ### Does this mean TPUs are replacing GPUs? Not broadly. The announcement reads more like a sharper positioning move than a universal GPU knockout punch. Google is saying: if you are training or serving at Google-scale — or you want into Google’s tightly integrated cloud stack — these TPUs may be the better fit. But that is different from saying ev(cloud.google.com)tters, and Google is still talking about general availability later this year rather than instant mass rollout. (blog.google) ### Why does this matter now? Because AI economics are shifting from “can you build the model?” to “can you afford to keep it running?” Training still matters, but inference bills are becoming the long tail. Google’s answer is to separate the two jobs and optimize each one harder. If that works, the company is not just selling silicon. It is selling a more opinionated way to build AI infrastructure. (cloud.google.com) ### Bottom line? The real news is not just that TPU 8t is faster or TPU 8i is cheaper. It’s that Google now thinks the AI stack has split enough to justify different chips for different phases — and it wants customers to buy into that whole architecture, not just the accelerator.