Alibaba Releases 397B Parameter Qwen3.5 Model
Alibaba has released Qwen3.5, a 397-billion-parameter multimodal Mixture-of-Experts (MoE) model. Despite its size, the model has only 17 billion active parameters at once, enabling reports of usable speeds on consumer-grade multi-GPU setups. The model, which has a commercial-friendly license, received day-zero support in the vLLM inference engine, according to a post from the development team.
- The model's architecture combines a sparse Mixture-of-Experts (MoE) design with Gated Delta Networks, a form of linear attention, to improve inference efficiency. - Compared to its predecessor, Qwen3.5 is 8.6 times faster for standard workflows and 19 times faster for decoding long-context tasks, while a native FP8 pipeline cuts memory requirements by 50%. - It is a native vision-language model, jointly trained on text, images, and UI screenshots, allowing it to handle tasks like document understanding and interacting with on-screen elements. - The model features a native context length of 262,144 tokens, which can be extended to over 1 million tokens, and its supported languages have been expanded from 119 to 201. - Alibaba Cloud offers a corresponding hosted API version named Qwen3.5-Plus, which provides a 1 million token context window by default and includes official built-in tools. - The Qwen family of models is released under the Apache 2.0 license, which has led to a large developer ecosystem with over 100,000 derivative models built on the platform.