Google debuts TPU 8t (training) and TPU 8i (inference) accelerators at Cloud Next

- Google Cloud used Cloud Next on April 22 to unveil TPU 8t for training and TPU 8i for inference, splitting its eighth-generation AI chips. - Google said TPU 8t scales to 9,600 chips and 2 petabytes of shared high-bandwidth memory, while TPU 8i targets low-latency serving. - The move extends Google’s push to sell a full AI stack, alongside Nvidia systems in its cloud. (techcrunch.com)

Google Cloud used Cloud Next on April 22, 2026, to introduce two separate eighth-generation Tensor Processing Units: TPU 8t for training and TPU 8i for inference. (blog.google) A training chip is built to teach a model from huge datasets; an inference chip is built to answer prompts quickly after the model is deployed. Google said those jobs now diverge enough that one design no longer fits both. (cloud.google.com) Google said TPU 8t is tuned for pre-training and embedding-heavy workloads, and that a single superpod can link 9,600 chips. Sundar Pichai said that system can expose 2 petabytes of shared high-bandwidth memory. (cloud.google.com) (blog.google) For serving models, Google said TPU 8i is built for post-training, reinforcement learning, and low-latency inference. Its product page says it delivers an 80% performance-per-dollar gain over prior generations for low-latency inference on large mixture-of-experts models. (cloud.google.com) Google tied both chips to its broader “AI Hypercomputer” pitch, which bundles accelerators, networking, storage, and software into one cloud system. At the same event, Thomas Kurian described that as a unified stack of chips, models, data, agents, and security. (cloud.google.com 1) (cloud.google.com 2) The company also said the eighth-generation TPU systems are hosted on its Axion Arm-based central processors, a change Google says reduces host-side bottlenecks in data preparation and orchestration. (cloud.google.com 1) (cloud.google.com 2) Google is not dropping Nvidia from its cloud lineup. TechCrunch reported that Google still plans to offer Nvidia’s Vera Rubin later this year and is working with Nvidia on networking efficiency inside Google Cloud. (techcrunch.com) That leaves the launch as less of a clean replacement story than a control story: Google wants more of the economics of training, post-training, and serving to sit inside its own infrastructure. Pichai said more than half of Google’s machine learning compute investment in 2026 is expected to go to the Cloud business. (blog.google) Google said both TPU 8t and TPU 8i are coming later this year, and it is already taking customer interest. The immediate message from Cloud Next was that Google now wants to sell separate silicon for building models and for running them at scale. (blog.google)

Google debuts TPU 8t (training) and TPU 8i (inference) accelerators at Cloud Next

Get your own daily briefing