Qwen 3.5 runs fully on iPhone 17 Pro via MLX

Benchmarks show Alibaba’s Qwen 3.5 (2B, 6‑bit) running fully on‑device on an iPhone 17 Pro using MLX optimizations, reportedly matching larger models in some reasoning and vision tasks shared. This is a concrete signal that Apple Silicon plus MLX toolchains can host serious LLM workloads locally — change the CPU/GPU/NE benchmarking you run for on‑device ML features.

Alibaba’s Qwen team published the small Qwen3.5 family — 0.8B, 2B, 4B and 9B parameter checkpoints — to Hugging Face and ModelScope on March 2, 2026 (github.com). The MLX toolchain used for Apple Silicon quantization and Metal-backed inference is credited to Apple’s Machine Learning Research group on community MLX model pages and MLX-quantized releases (huggingface.co). An open benchmarking project, qwenbench-mlx, benchmarks Qwen3.5 sizes (0.8B–35B/MoE) and reports generation tokens/s, prompt tokens/s, and peak unified memory (GB) as first-class metrics for MLX runs on Apple hardware (github.com). Community model releases show MLX quantization variants (8-bit/6-bit) of Qwen3.5 being packaged for Apple Silicon, with LM Studio and Hugging Face repos explicitly marking MLX-optimized artifacts for download (huggingface.co). Hands‑on guides and installs demonstrate MLX delivering material speedups on M‑series devices — community writeups report roughly 2× throughput improvements and other extension benchmarks cite up to ~70% gains in some Apple‑centric workflows (dev.to). Qwen’s research notes position Qwen3.5 as a native multimodal family with the 397B flagship released Feb 15, 2026 and claim strong reasoning, coding, and visual understanding across their benchmark suites, laying the foundation for the compact small‑model line’s behavior on-device (qwen.ai). Cross‑platform MLX/Exo community documentation and examples now surface iOS, iPadOS and macOS deployment options and explicitly target unified CPU/GPU memory strategies on Apple Silicon for on‑device LLMs, with practical recipes in the wild for packaging and inference tuning (petronellatech.com).

Qwen 3.5 runs fully on iPhone 17 Pro via MLX

Get your own daily briefing