M5 Max runs Qwen3.6 2x faster

- A May 19 YouTube demo showed Qwen3.6-27B running more than twice as fast on Apple’s M5 Max using MTPLX, a native MTP runtime for macOS. - The benchmark cited a 2.24x decode-speed gain, from 28 tokens per second to 63, using Qwen3.6-27B on a MacBook Pro M5 Max. (youtube.com) - MTPLX’s GitHub repository and website list current releases, install steps and benchmark notes for Apple Silicon users. (github.com)

A YouTube demo published on May 19 showed Qwen3.6-27B running at more than double its prior decode speed on Apple’s M5 Max when paired with MTPLX, a native multi-token prediction runtime for Apple Silicon. The video described the setup as “Native MTP on Apple Silicon” and said the gain came from software rather than a new chip. MTPLX’s GitHub repository and website put the cited improvement at about 2.24x on Qwen3.6-27B, moving from 28 tokens per second to 63 on a MacBook Pro M5 Max. (youtube.com) (github.com) The result matters because it shifts attention from the processor alone to the inference stack wrapped around it. MTPLX’s documentation says the runtime is MLX-native, uses speculative decoding, and preserves Qwen3.6’s built-in MTP path without relying on an external drafter. A related Hugging Face model card says many pre-converted MLX versions strip out those MTP weights during sanitization, which would block the acceleration path entirely. ### Why did the same M5 Max suddenly look much faster? (youtube.com) MTPLX says the gain came from “native MTP speculative decoding” on Apple Silicon rather than from a hardware change. The project describes the method as using Qwen3.6’s own multi-token prediction weights and “math-correct rejection sampling” to speed decode while keeping outputs exact. That distinction matters because decode speed is often treated as a fixed property of a machine-model pairing. In this case, the reported jump came after a runtime change that better exposed capabilities already present in the model and in Apple’s local inference stack. (github.com) The GitHub page says the multiplier is “hardware-independent,” meaning the technique is presented as a software-layer gain rather than an M5-only quirk. ### What exactly is “Native MTP” doing here? (github.com) Qwen3.6 includes multi-token prediction weights, and MTPLX is built to use them directly on Apple Silicon, according to the project’s documentation. Instead of generating one token at a time in a plain autoregressive loop, speculative decoding proposes multiple tokens and then verifies them, which can raise throughput when the runtime and kernels are tuned for that path. The Hugging Face model card adds an implementation detail with practical consequences: preserving the MTP weights is necessary for this acceleration to work. (github.com) It says common conversion flows can remove those weights, leaving users with a model that appears compatible on paper but cannot use the faster decode path in practice. ### Why does this change how Apple Silicon benchmarks should be read? The 2.24x figure suggests Apple Silicon AI results can swing materially with runtime maturity, kernel work and model packaging, not just with memory bandwidth or core counts. (github.com) A benchmark taken before MTP support, or with a sanitized model conversion, may understate what the same hardware can do a few weeks later. That does not make every benchmark obsolete. It does mean comparisons across Macs, GPUs and local runtimes depend heavily on whether the software stack is using the model’s full inference path. (huggingface.co) In this case, the named participants are not only Apple and Qwen, but also MLX, MTPLX and the conversion pipeline around the model. ### Does this only matter for one demo? MTPLX’s website frames the result as part of a broader Apple Silicon workflow, with install instructions through Homebrew and a setup flow aimed at local chat use on macOS. (github.com) The repository shows active updates in the last week, including release notes and benchmark files, which indicates the software is still moving quickly. The next place to watch is the MTPLX GitHub repository and site, where the developer has posted releases, benchmark updates and integration notes for Apple Silicon users. (youtube.com) (github.com)

M5 Max runs Qwen3.6 2x faster

Get your own daily briefing