Apple Silicon signals accelerate
Signals in developer channels suggest Apple is preparing an M5 series rollout—Mac mini and Studio configs now show 'unavailable' rather than false shipping dates—and hobbyists continue porting ML workloads to Apple silicon. A DFlash speculative-decoding port and MLX work let Qwen3-4B reach ~186 tokens/s on a MacBook, and a developer used AI agents to migrate a legacy Mac app to native Apple Silicon. (x.com/i/status/2042974376493224427) (x.com/i/status/2043471459121521117) (x.com/i/status/2043335131151188050)
Apple’s latest Mac mini and Mac Studio shortages are feeding two stories at once: supply is tightening now, and developers are already building for the next Apple silicon cycle. (macrumors.com) As of April 11, several higher-memory configurations in Apple’s United States online store had flipped from long waits to “currently unavailable,” including Mac mini models with 32 gigabytes or 64 gigabytes of memory and Mac Studio models with 128 gigabytes or 256 gigabytes. Other configurations were still orderable, but Apple was quoting delivery windows of one to three months. (macrumors.com) 9to5Mac reported the change came after some of the same machines had already slipped to four- or five-month estimates, and said “currently unavailable” often appears before a configuration disappears from Apple’s configurator entirely. The site noted Apple had already removed the Mac Studio’s 512 gigabyte memory option in March. (9to5mac.com) Apple has not announced new Mac mini or Mac Studio models, and current reporting points to a memory squeeze as one likely cause of the shortages. MacRumors said the unavailable machines are the ones with larger memory pools, which fits a broader market crunch tied to demand for artificial intelligence servers. (macrumors.com) That hardware backdrop matters because Apple has spent the past two years turning its Mac chips into a serious local artificial intelligence platform. Apple’s MLX project is an open-source machine learning framework for Apple silicon, and Apple said in November 2025 that MLX could use new “Neural Accelerators” on the M5 chip in the 14-inch MacBook Pro. (github.com, machinelearning.apple.com) For readers outside the field, “local inference” means running a language model on your own laptop instead of sending each prompt to a remote data center. Apple says MLX is built around the Mac’s shared memory, so the central processor and graphics processor can work on the same model data without copying it back and forth. (machinelearning.apple.com) One of the projects riding that stack is dflash-mlx, a GitHub port of speculative decoding, a method that uses a smaller draft model to guess tokens before a larger model verifies them. Its README says a Qwen3.5-4B setup on a MacBook Pro with an M4 Max and 36 gigabytes of memory reached 161.9 tokens per second with DFlash plus MLX, versus 119.4 for MLX alone and 76.4 for llama.cpp in a 4-bit test. (github.com) A separate January 2026 paper on vllm-mlx described the same race to make Apple laptops better at serving models natively. The authors reported 21 percent to 87 percent higher text throughput than llama.cpp across models from Qwen3-0.6B to Nemotron-30B, and up to 525 tokens per second on text workloads in tests on an M4 Max. (arxiv.org) The software push extends beyond model benchmarks to older Mac apps that still rely on Intel-era code. Apple’s developer documentation says a native Apple silicon app should be shipped as a universal binary with both arm64 and x86_64 code, while Rosetta remains a temporary translation layer rather than a long-term substitute. (developer.apple.com, developer.apple.com) So the immediate signal is simple: some high-memory Macs are no longer for sale at Apple’s store, even before any M5 desktop announcement. At the same time, the developer tools and hobby projects around Apple silicon are getting faster, more native, and more tightly aimed at running artificial intelligence workloads on the Mac itself. (macrumors.com, machinelearning.apple.com, github.com)