Qwen3.5 Models Now Run Locally on Apple Devices

The powerful Qwen3.5 family of AI models is now available for local inference on iOS and macOS through an update to MLX-Swift-LM. This development pushes the trend of on-device AI, allowing developers to build sophisticated AI applications that run directly on Apple hardware without relying on the cloud.

The Qwen large language model family, developed by Alibaba Group, has rapidly evolved with its Qwen2 series showing competitive performance against other open-source models. The flagship Qwen2-72B model, for instance, demonstrates superior or comparable results to models like Llama-3-70B in benchmarks covering natural language understanding, coding, and mathematics. Qwen models are built on the Transformer architecture, incorporating enhancements like SwiGLU activation and group query attention (GQA) to optimize performance. The training for the Qwen2 series involved up to 7 trillion tokens of multilingual data, with a specific focus on improving capabilities in code generation and mathematical reasoning. This on-device capability is powered by Apple's MLX framework, a machine learning library specifically optimized for Apple Silicon. MLX is designed to enable efficient execution of AI models directly on the unified memory of iPhones, iPads, and Macs, leveraging the hardware's Neural Engine for accelerated performance. The integration via MLX-Swift-LM is part of a broader industry push and a core component of Apple's AI strategy, which prioritizes on-device processing. This approach enhances user privacy by keeping data on the device, reduces latency by eliminating network round-trips, and ensures features work offline. For developers, this eliminates significant backend infrastructure costs and complexity associated with hosting large models. It allows for the creation of more responsive and secure AI-powered features natively within iOS and macOS applications, from advanced text generation to sophisticated in-app assistants. The latest instruction-tuned Qwen models support extended context lengths, with some versions handling up to 128,000 tokens. This enables more complex on-device tasks that require understanding large amounts of text, such as document summarization or in-depth Q&A, without relying on external cloud services.

Qwen3.5 Models Now Run Locally on Apple Devices

Get your own daily briefing