AMD EPYC CPUs Outperform Rivals in AI Inference

A new benchmark from Signal65 found that AMD EPYC host CPUs outperformed competitors in AI inference throughput, time-to-first-token (TTFT), and latency. The tests were conducted across several models, including GPT-OSS-120B and DeepSeek-R1. The results emphasize the importance of host CPU efficiency in the total cost of ownership (TCO) for AI systems.

The host CPU acts as the traffic cop for AI servers, feeding the GPUs that do the heavy lifting. In this role, high-frequency cores and substantial memory bandwidth are critical for keeping the expensive accelerators fully utilized. The Signal65 benchmark isolated this variable by using identical GPUs, software, and networking, with the only difference being the host CPU—pitting a high-frequency AMD EPYC CPU against a top-tier Intel Xeon processor. AMD's architectural choices appear to give it an edge in feeding these data-hungry AI models. EPYC processors feature a 12-channel DDR5 memory architecture, delivering significantly more memory bandwidth than the 8-channel design found in competing Intel Xeon processors. This advantage in data throughput is a key factor in reducing the time it takes for a model to begin generating a response, a critical user experience metric known as time-to-first-token (TTFT). The performance gains translate directly to total cost of ownership (TCO) improvements for AI infrastructure. An up to 14.64% increase in throughput means a single server can handle more concurrent users, potentially reducing the number of servers required. For large-scale deployments where power, space, and the cost of the accelerators themselves are major expenses, even a single-digit percentage improvement in host CPU efficiency can lead to significant capital and operational savings. While AMD and Intel compete for dominance in the general-purpose server market, the biggest cloud providers—Google, Microsoft, and AWS—are increasingly investing in their own custom silicon. Chips like Google's Axion CPU, Microsoft's Cobalt CPU and Maia AI accelerator, and AWS's Graviton series are designed to optimize performance, power consumption, and cost for their specific internal workloads. This "build vs. buy" trend creates a dual-track market. Hyperscalers use custom ASICs and Arm-based CPUs to drive down TCO for their own massive services, like search and social media feeds. However, they continue to offer the latest x86 CPUs from AMD and Intel to their cloud customers, who require the broad software compatibility, flexibility, and high single-threaded performance that these processors provide for a wide range of enterprise and AI workloads.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.