New Open LLM 'Qwen 3.5' Enters the Fray

A new open-source model, Qwen 3.5, has been detailed, boasting significant architectural upgrades over its predecessor. The model claims improved context length, reasoning, and multi-modal capabilities, positioning it as a new competitor in the open LLM space.

Alibaba's Qwen2 model series, the successor to Qwen1.5, offers a range of sizes from a mobile-friendly 0.5 billion parameters up to a powerful 72 billion parameter model. The architecture is based on the Transformer design and incorporates SwiGLU activation, group query attention (GQA), and other optimizations to improve inference speed and reduce memory usage. This makes even the larger models more efficient to run. The flagship Qwen2-72B model demonstrates significant performance gains over its predecessor and is competitive with other leading open-source models like Llama-3-70B. In benchmarks, it shows strong performance in language understanding, coding, mathematics, and reasoning. For instance, the base Qwen2-72B model scored 84.2 on MMLU, 64.6 on HumanEval, and 89.5 on GSM8K. A key enhancement in Qwen2 is its expanded multilingual capabilities, with training data covering 27 more languages beyond English and Chinese. This broader linguistic training improves its ability to understand and generate content across a diverse range of languages. For tasks requiring understanding of large amounts of information, the Qwen2-7B-Instruct and Qwen2-72B-Instruct models support a context length of up to 128,000 tokens. This is achieved through techniques like YARN (Yet another RoPE extensioN method), allowing the models to process and recall information from lengthy documents. The Qwen2 series also includes a Mixture-of-Experts (MoE) model, Qwen2-57B-A14B, which activates a smaller subset of its parameters (14 billion) for each token. This MoE architecture allows for a very large model size while keeping the computational cost of inference manageable. The models are openly available on platforms like Hugging Face and ModelScope, with resources provided for fine-tuning, quantization, and deployment to encourage community development and research. This open approach facilitates broader access and innovation within the AI community.

New Open LLM 'Qwen 3.5' Enters the Fray

Get your own daily briefing