Google's TPU v7 Claims Blackwell Parity, Skips Benchmark
Google's new TPU v7 'Ironwood' accelerator, with a reported 4,614 TFLOPS per chip, is being called "on par with Blackwell" for certain AI workloads. However, the company has raised questions about transparency and third-party validation by opting not to submit the new chip for the MLPerf Inference v6.0 benchmarks.
- On a per-chip basis, TPU v7 offers 4.6 PFLOPS of FP8 compute, 192 GB of HBM3e memory, and 7.4 TB/s of memory bandwidth, which is competitive with NVIDIA's B200 GPU that features 4.5 PFLOPS, 192GB of HBM, and 8TB/s of bandwidth. - Google's strategy focuses on massive scale-up systems called "pods," connecting up to 9,216 TPU v7 chips in a 3D torus topology, which differs from NVIDIA's NVL72 rack-scale systems that connect 72 GPUs with its NVLink fabric. - The move is part of a broader trend among hyperscalers like AWS, Microsoft, and Meta, who are increasingly developing custom ASICs to optimize performance-per-watt for their specific AI workloads and reduce dependency on merchant silicon providers. - From a total cost of ownership (TCO) perspective, Google's vertical integration gives it a significant advantage; the internal TCO for a full TPU v7 system is estimated to be around 44% lower than that of an NVIDIA GB200-based server. - While Google is skipping the MLPerf Inference v6.0 benchmark for TPU v7, it has a history of setting performance records in previous MLPerf training rounds with its TPU v3 and v4 generations. - The design of TPU v7 is heavily optimized for AI inference, a market segment projected to grow significantly, with major AI labs like Anthropic reportedly committing to use hundreds of thousands of TPUs for future workloads based on favorable economics. - Each TPU v7 chip features four Inter-Chip Interconnect (ICI) links providing 9.6 Tbps of aggregate bidirectional bandwidth, compared to the 14.4 Tbps of NVLink on an NVIDIA B200. - The custom silicon trend is projected to accelerate, with global data center AI ASIC shipments expected to triple between 2024 and 2027, eventually surpassing data center GPU shipments.