Google's Ironwood TPU
- Google announced the Ironwood TPU, rated at 4.6 petaFLOPS per chip, at its Cloud Next event. - It previewed an eighth‑generation split between training and inference chips, planned for TSMC's 2nm node in late 2027. - The move signals deeper custom‑silicon competition beyond Nvidia and more specialised compute for distinct AI workloads. (thenextweb.com)
A Tensor Processing Unit is Google’s in-house chip for artificial intelligence, built to do the matrix math that powers chatbots, image models, and recommendation systems faster than general-purpose processors. On April 9, 2025, Google used its Cloud Next conference in Las Vegas to introduce Ironwood, its seventh-generation TPU. (cloud.google.com) Google said Ironwood delivers 4,614 teraFLOPS, or about 4.6 petaFLOPS, of peak compute per chip and scales up to 9,216 chips in one pod. Google also said the largest Ironwood pod reaches 42.5 exaFLOPS of compute. (blog.google) Inference is the stage where a trained model answers a prompt, generates an image, or ranks a search result, and Google pitched Ironwood as its first TPU built for that job first. In its Cloud Next materials, Google said Ironwood offers five times more peak compute capacity and six times the high-bandwidth memory of the prior generation. (blog.google) Memory matters because large models stall when chips wait for data, and Google said Ironwood raises high-bandwidth memory to 192 gigabytes per chip with 7.2 terabits per second of bandwidth. Google also said the chip adds a stronger SparseCore, a specialized block for ranking and recommendation workloads. (blog.google) Google’s public documentation now describes TPU7x as the first release in the Ironwood family and says it is designed for both large-scale training and inference, not only serving finished models. The same documentation says TPU7x shares a 9,216-chip footprint with TPU v5p and supports dense models, mixture-of-experts systems, pre-training, sampling, and decode-heavy inference. (cloud.google.com) That puts Ironwood in the middle of a cloud market where providers are trying to reduce dependence on Nvidia by building more of their own silicon. Google, Amazon Web Services, and Microsoft have each been expanding custom chip programs while still offering Nvidia graphics processing units alongside them. (techcrunch.com) Google tied Ironwood to its broader “AI Hypercomputer” stack, which packages chips, networking, storage, and software together for cloud customers. The company said developers can use Ironwood through JAX and PyTorch tooling, and that access runs through Google Kubernetes Engine and TPU Cluster Director. (cloud.google.com) The company has also been making an efficiency case, not just a speed case. In an April 2026 update, Google said Ironwood showed about a 3.7-times improvement in compute carbon intensity over TPU v5p, based on measurements from January 2026. (cloud.google.com) Google said at Cloud Next that Ironwood would be available later in 2025, and its release notes later listed TPU7x in preview on November 24, 2025. The launch shows Google treating AI chips less like one-size-fits-all hardware and more like separate engines for separate workloads. (blog.google) (cloud.google.com)