Google's new AI chips
What happened
- Google unveiled two new AI chips split between training and inference to capture more AI compute economics. - The announcement splits TPU 8 architecture into separate training and inference models, claimed faster and cheaper than prior versions. - The launch is a hedge against Nvidia dependence while Google still offers Nvidia hardware in cloud listings (techcrunch.com).
Why it matters
Google used Cloud Next on April 22 to unveil two new eighth-generation artificial intelligence chips, splitting its Tensor Processing Unit line into one part for training and another for inference. (cloud.google.com) Training is the phase where a model learns from huge data sets; inference is the phase where it answers prompts after training is done. Google named the new chips TPU 8t for training and TPU 8i for inference. (blog.google) Google said TPU 8t is built for frontier-model training and TPU 8i is built for large-scale inference and reinforcement learning, a method where models improve through trial and error. The company said the split reflects different infrastructure needs for pre-training, post-training and real-time serving. (cloud.google.com) The company also said the chips will sit inside its AI Hypercomputer system, which combines processors, networking, software and data-center design into one package. Google said the eighth-generation family will be hosted for the first time on its own Arm-based Axion processors. (cloud.google.com) Google framed the launch around a shift in the artificial intelligence business: fewer companies are only training giant models, and more are paying to run them continuously for chatbots, search, coding tools and software agents. In a Cloud Next post, Sundar Pichai said Google’s first-party models now process more than 16 billion tokens per minute through direct customer API use, up from 10 billion last quarter. (blog.google) That demand helps explain why Google is now selling separate chips for separate jobs instead of one general design. In its announcement, Google said TPU 8i is aimed at “massive inference throughput,” while TPU 8t is tuned for large memory pools needed to train complex models. (blog.google) The move also widens Google’s effort to reduce how much of the AI hardware stack depends on Nvidia, even as Google continues to sell Nvidia systems in its cloud. Last month, Google Cloud announced expanded support for Nvidia products including RTX PRO 6000 Blackwell Server Edition systems and planned support for Nvidia Vera Rubin NVL72. (cloud.google.com) Google is not abandoning its earlier chips as it makes that shift. At Cloud Next in April 2025, the company introduced Ironwood, its seventh-generation TPU, as its first TPU designed specifically for inference; by April 2026, Google said Ironwood was generally available to cloud customers. (blog.google) (cloud.google.com) The new split suggests Google now sees training and inference as distinct businesses inside cloud computing, with different cost, speed and memory demands. The next test is whether cloud customers buy enough TPU 8t and TPU 8i capacity to pull more AI spending onto Google’s own silicon instead of Nvidia’s. (techcrunch.com)
Key numbers
- The announcement splits TPU 8 architecture into separate training and inference models, claimed faster and cheaper than prior versions.
- Google used Cloud Next on April 22 to unveil two new eighth-generation artificial intelligence chips, splitting its Tensor Processing Unit line into one part for training and another for inference.
- Google named the new chips TPU 8t for training and TPU 8i for inference.
- (blog.google) Google said TPU 8t is built for frontier-model training and TPU 8i is built for large-scale inference and reinforcement learning, a method where models improve through trial and error.
What happens next
- Google used Cloud Next on April 22 to unveil two new eighth-generation artificial intelligence chips, splitting its Tensor Processing Unit line into one part for training and another for inference.
- (cloud.google.com) The company also said the chips will sit inside its AI Hypercomputer system, which combines processors, networking, software and data-center design into one package.
- Google said the eighth-generation family will be hosted for the first time on its own Arm-based Axion processors.
Quick answers
What happened in Google's new AI chips?
Google unveiled two new AI chips split between training and inference to capture more AI compute economics. The announcement splits TPU 8 architecture into separate training and inference models, claimed faster and cheaper than prior versions. The launch is a hedge against Nvidia dependence while Google still offers Nvidia hardware in cloud listings (techcrunch.com).
Why does Google's new AI chips matter?
Google used Cloud Next on April 22 to unveil two new eighth-generation artificial intelligence chips, splitting its Tensor Processing Unit line into one part for training and another for inference. (cloud.google.com) Training is the phase where a model learns from huge data sets; inference is the phase where it answers prompts after training is done. Google named the new chips TPU 8t for training and TPU 8i for inference. (blog.google) Google said TPU 8t is built for frontier-model training and TPU 8i is built for large-scale inference and reinforcement learning, a method where models improve through trial and error. The company said the split reflects different infrastructure needs for pre-training, post-training and real-time serving. (cloud.google.com) The company also said the chips will sit inside its AI Hypercomputer system, which combines processors, networking, software and data-center design into one package. Google said the eighth-generation family will be hosted for the first time on its own Arm-based Axion processors. (cloud.google.com) Google framed the launch around a shift in the artificial intelligence business: fewer companies are only training giant models, and more are paying to run them continuously for chatbots, search, coding tools and software agents. In a Cloud Next post, Sundar Pichai said Google’s first-party models now process more than 16 billion tokens per minute through direct customer API use, up from 10 billion last quarter. (blog.google) That demand helps explain why Google is now selling separate chips for separate jobs instead of one general design. In its announcement, Google said TPU 8i is aimed at “massive inference throughput,” while TPU 8t is tuned for large memory pools needed to train complex models. (blog.google) The move also widens Google’s effort to reduce how much of the AI hardware stack depends on Nvidia, even as Google continues to sell Nvidia systems in its cloud. Last month, Google Cloud announced expanded support for Nvidia products including RTX PRO 6000 Blackwell Server Edition systems and planned support for Nvidia Vera Rubin NVL72. (cloud.google.com) Google is not abandoning its earlier chips as it makes that shift. At Cloud Next in April 2025, the company introduced Ironwood, its seventh-generation TPU, as its first TPU designed specifically for inference; by April 2026, Google said Ironwood was generally available to cloud customers. (blog.google) (cloud.google.com) The new split suggests Google now sees training and inference as distinct businesses inside cloud computing, with different cost, speed and memory demands. The next test is whether cloud customers buy enough TPU 8t and TPU 8i capacity to pull more AI spending onto Google’s own silicon instead of Nvidia’s. (techcrunch.com)