Google unveils two AI processors

- Google used Cloud Next ’26 on April 22 to launch two eighth-generation AI chips — TPU 8t for training and TPU 8i for inference. (blog.google) - The split is the point: TPU 8t scales to 9,600 chips per superpod, while TPU 8i is tuned for fast multi-step agent responses. (cloud.google.com) - This matters because Google is no longer selling one general TPU story — it is separating training from serving like the market now does. (cloud.google.com)

Google just made a pretty clear bet about where AI infrastructure is going. Not toward one giant do-everything chip, but toward specialized(blog.google)n TPUs: TPU 8t for training big models, and TPU 8i for running them quickly once they’re deployed. (blog.google)hips in two? Because training and inference have drifted apart. Pretraining a frontier model is a huge throughput problem — move absurd amoun(cloud.google.com)s about low latency, long context, and lots of sequential reasoning steps that need to feel fast to a user. Google’s pitch is that one design no longer fits both jobs well. (cloud.google.com) ### What is TPU 8t for? TPU 8t is the training side of the pair. Google says it is (blog.google)ips in a single superpod. The company is also leaning on SparseCore — a specialized accelerator for embedding lookups — because modern large models, especially recommendation and mixture-of-experts systems, spend a lot of time on exactly that kind of ugly memory work. (cloud.google.com) ### What is TPU 8i for? TPU 8i is the inference chip. This is th(cloud.google.com)ser’s behalf. The important claim here is not just “faster AI,” but faster interactive AI. Google is basically saying that if agents are going to feel useful, the hardware has to minimize the drag from long reasoning chains and repeated tool calls. That is the niche TPU 8i is supposed to fill. (blog.google) ### Why does “agentic” matter so much? Because it changes(cloud.google.com)cks tools, revises, and keeps context alive is much more like a loop than a single pass. That means different pressure on memory, networking, and latency. Google ties these chips directly to that shift, and even frames them as infrastructure for “world models” and reasoning-heavy systems, not just plain old text generation. (cloud.google.com) ### Is this just about chips? Not(blog.google)which includes networking, software, orchestration, and Arm-based Axion CPU hosts. The Axion piece matters because Google says it removes a host-side bottleneck — data prep and orchestration can starve accelerators if the CPUs feeding them are too weak. Basically, the company wants customers buying a system, not a part. (cloud.google.com) ### So is Google taking aim at Nvidia? Yes, but sideways. Goo(cloud.google.com)e-built silicon inside its own stack. That makes this less a direct CUDA clone than a vertical integration play — Google controls the chips, the network, the cloud, and many of the models that will run on top. (blog.google) ### What changed from the old TPU story? The old message was that Google had strong custom AI silicon. The new message is sharper: the AI lifecyc(cloud.google.com) big AI buyers already know from painful experience — training clusters and serving clusters do not want the same thing. (cloud.google.com) ### Bottom line? Google’s announcement matters because it turns a broad TPU roadmap into a two-lane one. TPU 8t is for building frontier models. TPU 8(blog.google) be less about raw chip bragging rights and more about who builds the best full stack for each stage of AI. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.