Google splits TPUs
- Google is separating its TPU line into distinct chips for training and for inference rather than one design. - The new split targets an 8th‑generation TPU family slated to launch later in 2026. - The move is explicitly framed as a response to rising agent workloads and to compete with Nvidia in AI acceleration (x.com).
Google is splitting its next Tensor Processing Unit family into two chips, one for training AI models and one for running them. (blog.google) Google said Wednesday its eighth-generation line will include TPU 8t for training and TPU 8i for inference, with launch timing set for later in 2026. Sundar Pichai said the “dual chip approach” is part of Google Cloud’s next wave of infrastructure for enterprise customers. (blog.google 1) (blog.google 2) Training is the step where a model learns from vast datasets; inference is the step where that trained model answers prompts and serves users. Google said those jobs now have different bottlenecks, so the company stopped trying to optimize one chip for both. (cloud.google.com) Google has been moving toward that split for more than a year. In April 2025, it introduced Ironwood, its seventh-generation Tensor Processing Unit and its first chip built specifically for inference. (cloud.google.com) (techcrunch.com) The new design is aimed at “agentic” AI, Google’s term for systems that take actions across multiple steps instead of returning one answer. Google said TPU 8i is tuned for large-scale inference and reinforcement learning, while TPU 8t is built for frontier-model training and embedding-heavy workloads. (blog.google) (cloud.google.com) Google tied the chip split directly to competition with Nvidia, whose graphics processors still dominate much of the market for AI training and inference. At Nvidia’s GTC conference in March, Google Cloud was still highlighting expanded support for Nvidia systems even as it prepared its own next TPU push. (blog.google) (cloud.google.com) Google said TPU 8t scales to 9,600 chips and 2 petabytes of shared high-bandwidth memory in one superpod. The company said TPU 8i is built for “high-speed inference” and for multi-step reasoning chains that keep chips busy longer than older chatbot workloads did. (blog.google) (cloud.google.com) The company is also pairing the new TPUs with Arm-based Axion central processors to reduce delays in data preparation and orchestration before work reaches the accelerators. Google said that host-side bottleneck had become a bigger problem as models and agent workflows grew more complex. (cloud.google.com) The result is a cleaner break inside Google’s AI hardware roadmap: one chip to build bigger models, another to keep them serving and reasoning at scale. The next test is whether cloud customers buy enough of both to make Google a stronger counterweight to Nvidia later this year. (blog.google 1) (blog.google 2)