Google's TPU 8 reveal
- Google published details of its eighth‑generation TPUs, naming two chips: TPU 8i and TPU 8t for different workloads. - The announcement framed the chips explicitly for the emerging 'agentic era' of AI workloads. - Google positions TPU 8 as an alternative infrastructure path for agentic systems, adding hardware choices beyond GPUs (blog.google).
Google used Cloud Next on April 22 to unveil its eighth-generation artificial-intelligence chips, splitting the line into TPU 8t for training models and TPU 8i for running them. (blog.google) A Tensor Processing Unit, or TPU, is Google’s in-house chip for artificial-intelligence math, while graphics processing units, or GPUs, are the more common alternative sold across the cloud market. Google says TPU 8t is built for frontier-model training, and TPU 8i is built for low-latency inference and reinforcement learning. (blog.google) (cloud.google.com) Google said the new chips were designed with Google DeepMind and folded into its AI Hypercomputer system, which combines chips, networking and software in one cloud stack. The company said customers can register interest now ahead of general availability later in 2026. (cloud.google.com) (blog.google) The company tied the launch to what it calls the “agentic” phase of AI, where software does multi-step work with less human prompting and needs faster responses during long chains of reasoning. Google said those workloads have pushed training, post-training and live serving into different hardware bottlenecks, which is why TPU 8 is now split into two systems. (blog.google) (cloud.google.com) Google is also using the launch to make a cloud-market argument: companies building AI agents do not have to rely only on GPUs. Its TPU pages now explicitly pitch Cloud TPUs as infrastructure for “all AI workloads, from training to inference,” alongside guidance on when customers may still prefer GPUs or central processing units. (cloud.google.com) (blog.google) On the training side, Google said TPU 8t scales to a 9,600-chip superpod and is aimed at pre-training and embedding-heavy workloads. Sundar Pichai said that system reaches 2 petabytes of shared high-bandwidth memory in one superpod and delivers up to twice the performance per watt of Ironwood. (cloud.google.com) (blog.google) On the serving side, Google said TPU 8i is tuned for near-zero-latency inference, the part of AI where a trained model answers requests in real time. Google also said the eighth-generation TPU family is hosted for the first time on its Arm-based Axion central processing units to reduce delays from data preparation and orchestration. (blog.google) (cloud.google.com 1) (cloud.google.com 2) The timing follows a rapid cadence in Google’s chip roadmap. Google introduced its seventh-generation Ironwood TPU at Cloud Next in April 2025 as its first TPU built specifically for inference, and by April 2026 the company had moved to a two-chip eighth generation organized around separate training and serving jobs. (blog.google 1) (blog.google 2) For customers, the immediate takeaway is choice inside Google Cloud: one chip for building large models, another for running agents that need quick replies, both arriving later this year. For Google, the message is that the next AI infrastructure fight will be over which hardware stack powers agent systems at scale. (blog.google) (cloud.google.com)