Alibaba opens Qwen3.6‑27B
- Alibaba released Qwen3.6‑27B under an Apache 2.0 license, positioning it for broad use and self‑hosting. - The model reportedly outperforms Alibaba's 397B MoE on coding benchmarks while running on a single 24GB GPU. - Alibaba claims Qwen3.6‑27B is about 92% cheaper than Claude 4.5 Opus, suggesting cheaper, compact models could accelerate self‑hosted agents (x.com).
Alibaba has released Qwen3.6-27B as open weights under an Apache 2.0 license, putting one of its newest coding models on GitHub and Hugging Face. (github.com) (huggingface.co) Qwen says the 27 billion-parameter model is the first open-weight variant in the Qwen3.6 line, after the company launched the hosted Qwen3.6-Plus model on April 2 and the open Qwen3.6-35B-A3B model on April 14. (alibabacloud.com) (qwen.ai) (github.com) Large language models are software that predict the next token, or chunk of text, and developers usually run the biggest ones through paid application programming interfaces because the hardware bill is high. Qwen3.6-27B is a smaller dense model, meaning all 27 billion parameters are active on each pass instead of routing work across a giant mixture-of-experts system. (huggingface.co) (alibabacloud.com) Alibaba’s model card says Qwen3.6-27B beats the older Qwen3.5-397B-A17B mixture-of-experts model on several coding-agent tests, including SWE-bench Pro, SkillsBench Avg5, NL2Repo, Claw-Eval Pass^3, and QwenClawBench. On SWE-bench Verified, the larger Qwen3.5-397B-A17B still scores higher, 76.2 to 77.2 for Qwen3.6-27B, while Claude 4.5 Opus remains ahead on most coding-agent benchmarks in Alibaba’s table. (huggingface.co) The release also pushes the economics of self-hosting into the story. Anthropic lists Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens, so a local model that avoids per-call API fees can look materially cheaper for teams running heavy coding and agent workloads. (anthropic.com) The “92% cheaper” comparison in the launch chatter appears to line up with Anthropic’s published output-token price if the alternative is about $2 per million tokens, because $2 is 8% of $25. Alibaba’s public model card for Qwen3.6-27B does not show that price math, so the cost claim should be read as a comparison circulating around the release rather than a figure documented in the official card. (anthropic.com) (huggingface.co) (x.com) Alibaba says the model was tuned for “agentic coding” and “thinking preservation,” its term for keeping reasoning context from earlier messages so long coding sessions do not lose the thread. The model card lists native context at 262,144 tokens and says it can be extended to 1,010,000 tokens. (github.com) (huggingface.co) The company’s own materials do not document the “single 24GB GPU” claim for the full-precision release, and the posted artifacts describe a 27B model distributed in standard weights for frameworks including Transformers, vLLM, SGLang, and KTransformers. In practice, fitting a model on 24 gigabytes of video memory usually depends on quantized versions made by the community, not the original full-precision files. (huggingface.co) (github.com) That leaves the release in a familiar 2026 pattern: frontier labs still lead many benchmarks, but open models keep getting smaller, cheaper, and easier to run outside the cloud. Alibaba’s latest move gives developers another model they can download, modify, and deploy without asking a closed-model vendor for access. (anthropic.com) (github.com)