Scale of TPU-8t pods
- A thread described Google's TPU-8t pod concept as roughly 9,600 chips per pod with about 121 exaflops peak. (x.com) - The same analysis noted these pods can scale into million-plus chip clusters using model-parallelism techniques. (x.com) - Those raw scale numbers explain why training very large models demands specialized interconnect, memory tiers, and parallel strategies. ( )
A Google training pod now spans 9,600 TPU 8t chips and reaches 121 exaflops of peak FP4 compute in one superpod. (cloud.google.com) Tensor Processing Units, or TPUs, are Google’s custom chips for machine learning, built to do the matrix math behind large models faster than general-purpose processors. Google said on April 22 that TPU 8t is its eighth-generation training chip, while TPU 8i is a separate design for inference. (cloud.google.com, blog.google) Inside one TPU 8t superpod, those 9,600 chips share 2 petabytes of high-bandwidth memory, the fast local memory attached to accelerators. Google said the system delivers three times Ironwood’s processing power and up to 2x more performance per watt. (blog.google, cloud.google.com) The scale matters because modern model training is often limited by how fast chips can exchange data, not just by raw arithmetic speed. Google kept a 3D torus interconnect for TPU 8t, a layout that links each chip to six neighbors so work can be split across the pod with fewer bottlenecks. (cloud.google.com, cloud.google.com) Google paired that pod design with Virgo Network, a data-center fabric that can link 134,000 TPU 8t chips in one fabric with up to 47 petabits per second of non-blocking bisection bandwidth. Google also said JAX and Pathways can scale a single training cluster past 1 million TPU chips. (cloud.google.com, cloud.google.com) That software layer is the other half of the story. Google’s TPU scaling documentation describes model sharding, or slicing a model across many chips, so different parts of the network, activations, and optimizer state can live on different devices and run in parallel. (docs.cloud.google.com, docs.cloud.google.com) Google has been climbing toward this size for several generations. TPU v5p, introduced in December 2023, scaled to 8,960 chips per pod, and Ironwood, announced in 2025, scaled to 9,216 chips and 42.5 exaflops per pod. (cloud.google.com, blog.google) The jump from those systems to TPU 8t helps explain why frontier-model training now depends on stacked memory, custom networking, and distributed software as much as on the chip itself. At this size, a “pod” is less a box of processors than a coordinated machine spread across thousands of accelerators. (cloud.google.com, cloud.google.com)