Alibaba Releases 397B Parameter Qwen3.5 Model

Alibaba has released Qwen3.5, a 397-billion-parameter multimodal Mixture-of-Experts (MoE) model. Despite its size, the model has only 17 billion active parameters at once, enabling reports of usable speeds on consumer-grade multi-GPU setups. The model, which has a commercial-friendly license, received day-zero support in the vLLM inference engine, according to a post from the development team.

- The model's architecture combines a sparse Mixture-of-Experts (MoE) design with Gated Delta Networks, a form of linear attention, to improve inference efficiency. - Compared to its predecessor, Qwen3.5 is 8.6 times faster for standard workflows and 19 times faster for decoding long-context tasks, while a native FP8 pipeline cuts memory requirements by 50%. - It is a native vision-language model, jointly trained on text, images, and UI screenshots, allowing it to handle tasks like document understanding and interacting with on-screen elements. - The model features a native context length of 262,144 tokens, which can be extended to over 1 million tokens, and its supported languages have been expanded from 119 to 201. - Alibaba Cloud offers a corresponding hosted API version named Qwen3.5-Plus, which provides a 1 million token context window by default and includes official built-in tools. - The Qwen family of models is released under the Apache 2.0 license, which has led to a large developer ecosystem with over 100,000 derivative models built on the platform.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.