Qwen3 tested locally
- Developers are stress‑testing Alibaba’s quantized Qwen3.6‑35B‑A3B on 32GB Apple Silicon Macs to run local agentic coding workflows (startupfortune.com). - The experiments target a quantized MoE model (Qwen3.6‑35B‑A3B) running on 32GB unified memory hardware (startupfortune.com). - These tests aim to make local agent coding feasible without cloud inference, linking developer tooling choices to desktop hardware capability (startupfortune.com).
Developers are trying to run Alibaba’s new Qwen3.6-35B-A3B coding model on 32GB Apple Silicon Macs instead of renting cloud inference. (qwen.ai) Qwen released the open-weight model on April 14, 2026, describing it as a mixture-of-experts system with 35 billion total parameters and 3 billion active parameters per token. The weights are also posted on Hugging Face under an Apache 2.0 license. (qwen.ai) (huggingface.co) A mixture-of-experts model works like a team where only a few specialists answer each prompt, instead of waking up every worker for every task. Qwen says that setup lets this model target “agentic coding” jobs such as repository reasoning and tool use while keeping active compute closer to a 3 billion-parameter model. (qwen.ai) Quantization is the compression step that makes these local tests possible: it stores model weights in fewer bits, trading some precision for a smaller memory footprint. Apple’s MLX-LM toolkit explicitly supports quantizing models and running them on Apple Silicon. (github.com) The hardware angle matters because Apple Silicon uses unified memory, which means the central processor and graphics processor share one memory pool instead of copying data back and forth. Apple says MLX takes advantage of that design so operations can run on the central processor or graphics processor “without needing to move memory around.” (machinelearning.apple.com) That makes a 32GB Mac a useful test case for local coding agents: the machine has one shared memory budget for the model, the codebase, and the editor at the same time. Apple’s developer documentation also tells software makers to tune code paths specifically for Apple Silicon performance. (machinelearning.apple.com) (developer.apple.com) Qwen is pitching this model directly at coding workloads, not just chat. In its own benchmark table, Qwen reports a 73.4 score on SWE-bench Verified and 51.5 on Terminal-Bench 2.0, both tests tied to software engineering tasks. (qwen.ai) MLX-LM has become part of the appeal for Mac users because it is built for text generation and fine-tuning on Apple Silicon and can pull models from Hugging Face with a single command. That gives developers a relatively direct path from an open model release to a laptop-based test rig. (github.com) The open question is not whether Qwen3.6-35B-A3B can run on a Mac, but whether a quantized version can stay fast and stable enough for full coding loops on 32GB machines. If those tests hold up, more of the software stack for coding assistants could move from rented servers back onto developers’ desks. (qwen.ai) (github.com)