Apple M3 vs M5 benchmark flip

- Tech-Practice posted a May 6 YouTube retest saying Apple’s newer M5 Max does not automatically beat an M3 Max in local LLM inference. - The reversal hinges on workload shape: once the model size, context window, and sustained run length change, the older chip can stay ahead. - That matters because Apple marketed M5 Max as up to 4x faster for AI, but real on-device AI depends on fit.

Apple’s M5 Max is supposed to be the easy answer for local AI on a Mac. More bandwidth, new GPU-side neural accelerators, newer architecture — done. But a May 6 video from Tech-Practice lands on a messier point: in side-by-side llama.cpp testing, an M3 Max can still come out ahead for some local LLM jobs. (youtube.com) That sounds backwards until you remember what these machines are actually doing. Local AI is not one benchmark. It is a pile of different bottlenecks — model size, quantization, context length, memory pressure, prompt ingestion, token generation, and how long the laptop has to hold speed without sagging. Change the shape of the job, and the “faster” chip can stop looking faster. (youtube.com) ### Why did people expect the M5 Max to win? Because Apple pitched it that way. When Apple launched the March 3, 2026 MacBook Pro refresh, it said M5 Pro and M5 Max were built “from the ground up for AI,” added neural accelerators to each GPU core, raised unified memory bandwidth, and promised up to 4x AI performance versus the previous gener(youtube.com) (apple.com) ### What changed in this retest? The useful part is not that Tech-Practice found one weird result. It is that the channel publicly reversed an earlier, simpler assumption. The new video says the M5 Max purchase was meant to make local LLM work faster, but side-by-side tests with the same m(apple.com)ction to the usual “newer chip wins” framing. (youtube.com) ### What kind of workloads flip the result? The flip shows up in on-device inference workloads where memory behavior matters as much as raw compute. If a model or context window pushes hard on unified memory, bandwidth and footprint start acting like traffic lanes, not horsepower. A chip can have stronger peak AI hardware and still lose time (youtube.com). That is basically what this whole debate is about. (youtube.com) ### Why would an older chip ever stay ahead? Because sustained throughput is not the same thing as burst speed. Think of two highways: one has a higher speed limit, but the merge points clog up under load. The other has a lower headline limit, but traffic keeps moving. For local LLMs, once you sit in a long generation run, the winner can be t(youtube.com)one with the flashier spec sheet. The video points at exactly those practical constraints — memory footprint, bandwidth, thermal behavior, and sustained runs. (youtube.com) ### Does this mean Apple’s claim was wrong? Not necessarily. Apple’s “up to 4x” language is broad and benchmark-shaped. It can be true for selected AI workloads without meaning every local model, every quant, and every context length gets a 4x jump. The gap here is between vendor AI claims and the very specific reality of running local LLMs o(youtube.com)her test. (apple.com) ### So what should buyers take from this? Do not buy an Apple Silicon upgrade for “AI” in the abstract. Buy for your model sizes, your context windows, your runtime, and your tolerance for paying a lot more for gains that may only show up in some jobs. The M5 Max still looks stronger on pa(apple.com) comparison charts — but this retest is a reminder that local AI performance is workload-specific, not linear across generations. (notebookcheck.net) ### Bottom line? The surprise is not that M3 Max can occasionally beat M5 Max. The surprise is that local AI on Macs is now complicated enough that “newest” is no longer a sufficient benchmark answer. (youtube.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.