Alibaba's Qwen3.5 AI Beats Larger Models
Alibaba's new open-source AI model family, Qwen3.5, is resetting performance expectations. Its 397B parameter model, which uses a Mixture-of-Experts (MoE) approach to activate only 17B parameters at a time, outperforms much larger models on reasoning and coding, while a sub-1B version can run video inference directly on mobile phones.
The Qwen2 family includes five model sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and the flagship Qwen2-72B. This range allows developers to choose a model that fits their specific computational resources, from mobile devices to large-scale server deployments. All models in the series are available on the Hugging Face and ModelScope platforms. The Mixture-of-Experts (MoE) architecture is a key innovation that allows for massive model scale without a proportional increase in computational cost during inference. In an MoE model, the network is divided into specialized "expert" sub-networks, and a "gating network" routes each input token to a small subset of these experts. This sparse activation means only a fraction of the model's total parameters are used for any given prediction, enhancing efficiency. On performance benchmarks, the Qwen2-72B instruction-tuned model has demonstrated capabilities competitive with or exceeding other leading open-source models like Meta's Llama 3-70B in areas like coding and mathematics. It has also shown performance comparable to some proprietary models across a range of benchmarks testing language understanding, generation, and reasoning. Beyond English and Chinese, Qwen2 was trained on data from an additional 27 languages, giving it strong multilingual capabilities for global applications. The larger instruction-tuned models, Qwen2-7B-Instruct and Qwen2-72B-Instruct, can also handle extended context lengths of up to 128,000 tokens, which is crucial for tasks involving long documents like research papers or legal contracts. For developers and students looking to experiment, most of the Qwen2 models, including the 0.5B, 1.5B, 7B, and 57B-A14B versions, are released under the Apache 2.0 license. The models are compatible with a variety of tools and frameworks for local execution, such as llama.cpp, Ollama, and MLX for Apple Silicon, making them accessible for hands-on projects.