Google's TPU v7 Claims Blackwell Parity, Skips Benchmark

Google's new TPU v7 'Ironwood' accelerator, with a reported 4,614 TFLOPS per chip, is being called "on par with Blackwell" for certain AI workloads. However, the company has raised questions about transparency and third-party validation by opting not to submit the new chip for the MLPerf Inference v6.0 benchmarks.

- On a per-chip basis, TPU v7 offers 4.6 PFLOPS of FP8 compute, 192 GB of HBM3e memory, and 7.4 TB/s of memory bandwidth, which is competitive with NVIDIA's B200 GPU that features 4.5 PFLOPS, 192GB of HBM, and 8TB/s of bandwidth. - Google's strategy focuses on massive scale-up systems called "pods," connecting up to 9,216 TPU v7 chips in a 3D torus topology, which differs from NVIDIA's NVL72 rack-scale systems that connect 72 GPUs with its NVLink fabric. - The move is part of a broader trend among hyperscalers like AWS, Microsoft, and Meta, who are increasingly developing custom ASICs to optimize performance-per-watt for their specific AI workloads and reduce dependency on merchant silicon providers. - From a total cost of ownership (TCO) perspective, Google's vertical integration gives it a significant advantage; the internal TCO for a full TPU v7 system is estimated to be around 44% lower than that of an NVIDIA GB200-based server. - While Google is skipping the MLPerf Inference v6.0 benchmark for TPU v7, it has a history of setting performance records in previous MLPerf training rounds with its TPU v3 and v4 generations. - The design of TPU v7 is heavily optimized for AI inference, a market segment projected to grow significantly, with major AI labs like Anthropic reportedly committing to use hundreds of thousands of TPUs for future workloads based on favorable economics. - Each TPU v7 chip features four Inter-Chip Interconnect (ICI) links providing 9.6 Tbps of aggregate bidirectional bandwidth, compared to the 14.4 Tbps of NVLink on an NVIDIA B200. - The custom silicon trend is projected to accelerate, with global data center AI ASIC shipments expected to triple between 2024 and 2027, eventually surpassing data center GPU shipments.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.