Alibaba releases Qwen3.5 multimodal model

Alibaba has released Qwen3.5, a large-scale, natively multimodal Mixture-of-Experts (MoE) model. The model features a 397-billion parameter architecture with 17 billion active parameters, a 262k token context window, and a hybrid Mamba-attention design. According to developers, it has leading capabilities in search and document understanding and is available for commercial use with day-zero support in the vLLM inference library.

- Alibaba's Qwen2.5 claims superior performance over several market leaders, including OpenAI's GPT-4o, Anthropic's Claude-3.5-Sonnet, and Meta's Llama-3.1–405B in areas like reasoning and coding. The model was pretrained on over 20 trillion tokens of multilingual text and domain-specific data. - The Mixture-of-Experts (MoE) architecture allows for a massive parameter count while only activating a fraction of "expert" sub-networks for any given input, which can lead to greater efficiency compared to dense models of a similar size. Qwen2.5's architecture also incorporates features like Grouped Query Attention (GQA) to improve inference efficiency. - For enterprise applications, AI agents are moving beyond simple automation to handle complex, multi-step workflows by reasoning, adapting to new information, and making decisions. These agents can be applied to functions like real-time inventory management, quality control analysis in manufacturing, and accelerating healthcare prior authorizations. - The vLLM inference library offers high-throughput serving for large language models by using techniques like PagedAttention, which manages the memory of attention keys and values more efficiently, and continuous batching of incoming requests. It supports various forms of parallelism, including tensor and expert parallelism, to distribute inference across multiple GPUs. - In the UK's programmatic advertising market, which is nearing saturation for digital display, significant growth is now concentrated in connected TV (CTV) and digital out-of-home (DOOH). Programmatic ad spending in the UK is projected to have a compound annual growth rate of 14.37% between 2025 and 2035. - London's tech startup scene saw a record $3.5 billion in VC funding for AI startups in 2024, positioning it as Europe's leading AI hub. In the first quarter of 2025, London startups raised a total of £2.69 billion, more than France, Germany, and Spain combined. - For those aspiring to a CTO role in a B2B SaaS company, the position's demands evolve from being a hands-on technical contributor in the early stages to a strategic leader focused on scaling the engineering organization, managing budgets, and aligning technology with business objectives as the company grows. A successful CTO must effectively bridge the gap between technology and business needs. - In Formula 1, teams are preparing for the 2026 season which will feature new regulations that have drawn mixed reactions from drivers, with some, like Max Verstappen, being critical of the changes. Pre-season testing has been underway in Barcelona and Bahrain.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.