AMD narrows MLPerf gap
Social posts show AMD’s MI355X made measurable progress in MLPerf 6.0, closing performance gaps to Nvidia on selected benchmarks by roughly 10–30%. That closer competition in specific workloads suggests vendors beyond Nvidia are making targeted hardware gains, even if incumbents still lead on many new tests. (x.com)
If you want to understand this story, start with what these chips actually do. An artificial intelligence inference chip is the machine that takes a trained model and turns it into answers, images, or video for users, and MLPerf is the industry benchmark run by MLCommons to measure how fast that happens under fixed rules. (mlcommons.org) MLPerf matters because chip vendors can’t just post a peak speed from a lab demo. MLCommons says the suite is built to be architecture-neutral, representative, and reproducible, which is why cloud buyers and server makers use it as a common scoreboard. (mlcommons.org) The new results landed on April 1, 2026, and this was not a routine update. MLCommons called version 6.0 the biggest revision yet, with 5 of the 11 datacenter tests new or updated. (mlcommons.org) Those new tests pushed the benchmark closer to the models companies are actually deploying now. Version 6.0 added or expanded workloads for GPT-OSS 120B, DeepSeek-R1, DLRMv3 recommendations, a Shopify-based vision-language model, and the suite’s first text-to-video benchmark. (mlcommons.org) That change is why the AMD story is more nuanced than “won” or “lost.” Nvidia said its Blackwell Ultra systems were the only platform submitted on all newly added models and scenarios, and that they delivered the highest throughput across the widest range of tests. (developer.nvidia.com) AMD’s angle was narrower and more interesting. In its own MLPerf Inference 6.0 submission, AMD centered the Instinct MI355X on selected open-division tests and said the chip showed improved performance on standard and pruned models, rather than trying to match Nvidia everywhere at once. (rocm.blogs.amd.com) That is where the “gap narrowed” claim comes from. Third-party coverage of the official results said AMD got close on a few inference tests and even edged a Blackwell partner submission by about 4% on a single-node Llama 2 70B run, while still trailing on much of the broader suite. (forbes.com) AMD also used this round to show scale, not just one-box speed. Coverage of AMD’s submission said MI355X crossed 1 million tokens per second on multinode runs, including an 11-node, 87-GPU Llama 2 70B setup with reported scale-out efficiency between 93% and 98%. (storagereview.com) The software stack is part of the story too. AMD’s reproducibility post said its version 6.0 results were built on the ROCm software stack and included multinode runs using the MXFP4 data type, which shows the company is trying to close the tooling gap as well as the hardware gap. (rocm.blogs.amd.com) Nvidia still looks stronger when the benchmark gets broader and harder. Nvidia said Blackwell Ultra posted records on the new tests, including DeepSeek-R1 Interactive, Qwen3-VL-235B-A22B, GPT-OSS-120B, WAN-2.2-T2V-A14B, and DLRMv3, and it did that with 14 partners submitting on the platform. (developer.nvidia.com) So the real takeaway is not that Nvidia was displaced in April 2026. It is that MLPerf 6.0 showed AMD’s MI355X moving from “alternative” territory toward “credible option” territory on specific workloads, which is exactly how competition usually returns to a market one benchmark at a time. (mlcommons.org)