Phoronix: EPYC Turin vs Nvidia Grace
Phoronix benchmarks show AMD’s EPYC Turin server CPUs outperforming Nvidia’s GH200 Grace CPUs in current tests, suggesting Nvidia’s upcoming Vera CPUs would need a big uplift just to match two‑year‑old Turin silicon. It’s a notable datapoint if you care about compute choices for in‑house AI tooling or build rigs. (phoronix.com) (x.com)
Michael Larabel published Phoronix’s GH200 vs. EPYC Turin CPU comparison on November 7, 2024, reporting targeted CPU-only benchmarks run with remote access to a GH200 system provided by GPTshop.ai. (phoronix.com)) The GH200 test system used a 72-core Arm Neoverse‑V2 Grace CPU clocked at 3.47 GHz with 480 GB of LPDDR5-6400 memory, while the Turin side included single-socket EPYC 9575F (64‑core @ 3.30 GHz), EPYC 9655 (96‑core @ 2.60 GHz) and EPYC 9755 (128‑core @ 2.70 GHz) configurations under Ubuntu 24.04 LTS with GCC 13.2. (openbenchmarking.org)) Phoronix’s raw-performance results showed the GH200 trailing the Turin EPYC parts on CoreMark, ClickHouse database server tests, Memcached, and large code‑compile workloads, with the higher‑core or higher‑frequency EPYC SKUs leading many CPU‑bound benchmarks. (phoronix.com)) GH200 was competitive or superior in certain storage/kv workloads—matching the EPYC 9655 on RocksDB and performing well in the Speedb key‑value tests—and Phoronix recorded the GH200 delivering the best CPU performance‑per‑watt in RocksDB. (phoronix.com)) Larabel measured lower CPU power draw on the GH200 during compilation and other tests, which translated into a consistent performance‑per‑Watt advantage for the Grace CPU versus the Turin samples in several workloads. (phoronix.com)) Phoronix flagged test caveats: the GH200 run used a remote system with different NVMe storage, the EPYC side required a patched kernel for Zen 5 power monitoring, and several HPC benchmarks remain less optimized for AArch64 than for x86_64—all factors called out in the article. (phoronix.com)) NVIDIA publicly unveiled its Vera CPU at GTC on March 16, 2026 as an 88‑core Arm server processor with up to ~1.5 TB memory per socket and liquid‑cooled racks that can aggregate 256 Vera chips, and NVIDIA’s release touted up to “twice the efficiency and 50% faster” than traditional rack CPUs. (investor.nvidia.com)) Independent early benchmarks and partner tests published after Vera’s unveiling showed substantial gains in some streaming and analytics workloads—Redpanda reported up to 73% higher throughput versus AMD Turin in its streaming tests—illustrating that third‑party Vera numbers can differ significantly by workload. (redpanda.com))