Alibaba launches Qwen3.5‑Omni
Alibaba debuted Qwen3.5‑Omni, a native multimodal model that handles text, audio, images and video—positioning itself against GPT‑4o and Gemini and underscoring multimodality as the new baseline. The model is not open source, highlighting a competitive push by large cloud vendors to own multimodal stacks. (i10x.ai)
Alibaba published Qwen3.5‑Omni on March 30, 2026 and listed both offline API and real‑time API endpoints for deployment. (qwen.ai) The release offers three model sizes—Plus, Flash and Light—with the Plus variant reported to have achieved 215 SOTA results across audio, audio‑video understanding, reasoning and interaction benchmarks. (buildfastwithai.com) Qwen3.5‑Omni uses a Thinker–Talker design with Hybrid‑Attention Mixture‑of‑Experts components and advertises a 256K token long‑context window for extended multimodal inputs. (qwen.ai) Alibaba claims native voice features including voice cloning and real‑time text‑to‑speech, plus 113‑language speech recognition and the ability to ingest more than 10 hours of audio or roughly 400 seconds of 720p video at 1 FPS. (aihola.com) Benchmarks published by third‑party coverage show the Plus variant outperforming Google’s Gemini 3.1 Pro on multiple audio tasks, reasoning and translation metrics in early evaluations. (decrypt.co) The Information reports Qwen3.5‑Omni is being released as a proprietary model rather than an open‑source package, a shift from earlier Qwen3‑Omni editions that had permissive releases. (theinformation.com) Alibaba’s blog states the model was pretrained on massive multimodal corpora including claims of over 100 million hours of audio‑visual data. (qwen.ai) The rollout coincides with reporting that Alibaba Cloud plans an AI compute price increase of up to 34% starting April 18, 2026, a timing analysts note could affect enterprise adoption costs for large‑context multimodal workloads. (cntechpost.com)