AMD posts MLPerf win

AMD reported its Instinct MI355X GPUs exceeded one million tokens per second in MLPerf Inference v6.0, citing ROCm software scalability for AI workloads. (x.com) The vendor presented the benchmark as evidence of performance at scale for data-center inference. (x.com)

Artificial intelligence inference is the step where a trained model turns a prompt into an answer, and AMD said its Instinct MI355X systems cleared 1 million tokens a second in the MLPerf Inference v6.0 benchmark round released April 1. (mlcommons.org) MLPerf is the industry benchmark run by MLCommons to measure how fast systems process inputs and return results under standard rules, datasets, latency limits, and quality targets. Its datacenter suite is designed to compare hardware and software in reproducible tests rather than vendor demos. (mlcommons.org) In this round, MLCommons said five of the eleven datacenter tests were new or updated, including a new open-weight GPT-OSS 120B model, an expanded DeepSeek-R1 reasoning test, DLRMv3 for recommendations, a text-to-video benchmark, and a new vision-language benchmark built from Shopify catalog data. (mlcommons.org) AMD’s submission used Instinct MI355X systems in single-node and multi-node runs, with 8 graphics processors per platform and larger clusters of 87 or 94 graphics processors for some large-language-model tests. AMD said it submitted Llama 2 70B, GPT-OSS 120B, and Wan 2.2 text-to-video workloads in this round. (rocm.blogs.amd.com) The company tied the result to ROCm, its Radeon Open Compute software stack, which AMD said in version 7.2 added topology-aware communication, lower-level math-kernel tuning, and support for low-precision formats such as FP8 and FP4 to raise throughput and cut latency. (rocm.blogs.amd.com) Low-precision formats matter because they store each number in fewer bits, like shrinking every box in a warehouse so more of them fit on the same shelves. AMD’s MI355X is built for that tradeoff, with 288 gigabytes of high-bandwidth memory and up to 8 terabytes a second of memory bandwidth per chip. (vultr.com) MLCommons says its results are organized by scenario, division, and availability category, with “Available” reserved for systems customers can buy or rent now. AMD said nine partners submitted Instinct-platform results in that Available category for v6.0. (mlcommons.org, rocm.blogs.amd.com) That availability point has become more concrete over the past six months. Oracle said on October 14, 2025 that it had made MI355X bare-metal instances generally available, calling itself the first hyperscaler to publicly offer the chip, while Vultr announced worldwide availability on September 9, 2025 and Crusoe now lists MI355X capacity on its cloud. (blogs.oracle.com, blogs.vultr.com, crusoe.ai) MLPerf does not reduce the market to one winner, because results vary by model, scenario, cluster size, software stack, and whether a submission is in the Closed or Open division. But the April 1 release gives AMD a fresh third-party data point as cloud providers start selling MI355X capacity for large-model inference. (mlcommons.org, mlcommons.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.