Google TPU v8 split

- Google split its TPU v8 family into two variants: a training‑focused 8t and an inference/agent 8i. - Public figures show 8t at about 124% perf/watt and 8i at about 117% perf/watt, with 8i offering ~80% better $/perf. - The bifurcation targets frontier model training versus low‑latency inference and highlights cost versus performance tradeoffs for AI infrastructure (x.com).

Google has split its newest artificial-intelligence chip family in two, with one Tensor Processing Unit for training models and another for serving them live. (cloud.google.com) A Tensor Processing Unit, or TPU, is Google’s custom chip for the matrix math behind neural networks, the repeated multiply-and-add work that powers model training and response generation. Google says TPUs are specialized for those operations, unlike central processing units and graphics processing units, which are built for broader workloads. (docs.cloud.google.com) Google unveiled the eighth-generation family on April 22 at Cloud Next 2026, naming the two versions TPU 8t and TPU 8i. The company said TPU 8t is built for frontier-model training, while TPU 8i is built for large-scale inference and reinforcement learning. (blog.google) Google’s technical blog says the split reflects a change in AI workloads: pre-training, post-training and real-time serving now stress hardware in different ways. The company said the new systems are part of its AI Hypercomputer stack and use Arm-based Axion central processing units as hosts. (cloud.google.com) For training, Google said TPU 8t scales to 9,600 chips in a single superpod and uses a 3D torus network, a layout meant to move data across very large clusters. The company said that design targets throughput, the total amount of work a training run can finish over time. (cloud.google.com) For inference, Google said TPU 8i is tuned for low latency, the delay before a model starts producing tokens, and for the repeated back-and-forth of AI agents. Google also said TPU 8i expands on-chip static random-access memory and high-bandwidth memory so large key-value caches can stay on the chip package instead of spilling out to slower memory paths. (cloud.google.com) Google said TPU 8t delivers about 124% better performance per watt than Ironwood, the prior generation, while TPU 8i delivers about 117% better performance per watt. For customers buying inference capacity, Google said TPU 8i targets about 80% better performance per dollar than Ironwood at low-latency settings. (cloud.google.com) Those numbers show Google is no longer chasing one chip that does every AI job equally well. The company is separating the economics of building bigger models from the economics of running millions of user prompts and agent steps after those models are deployed. (cloud.google.com) Google tied the launch directly to “agentic” software, its term for systems that plan, reason and execute multi-step tasks. Chief Executive Sundar Pichai said on April 22 that Google’s first-party models are now processing more than 16 billion tokens per minute through direct application programming interface use by customers, up from 10 billion in the prior quarter. (blog.google) The split also sharpens Google’s pitch against Nvidia in cloud AI infrastructure: one chip for maximum training scale, another for cheaper, faster serving. After years of selling TPUs as a general family, Google is now selling a choice. (techcrunch.com)

Google TPU v8 split

Get your own daily briefing