Qwen3.6‑Plus shows coding chops

A recent demo of Qwen3.6‑Plus reported strong coding ability, with the model hitting about 78.8% on a SWE‑bench designed for repo fixes, terminal ops and full app generation from descriptions. The benchmark highlights how models are getting better at end‑to‑end developer tasks, not just single snippets. (x.com)

Alibaba’s Qwen team says its new flagship model, Qwen3.6-Plus, is unusually good at real software work. In a launch post published on April 1, the team said the model scored 78.8 on SWE-bench Verified, a widely watched test built from real GitHub issues, and 61.6 on Terminal-Bench 2.0, which measures whether an AI agent can actually operate inside a terminal instead of merely suggesting code. (qwen.ai) That distinction matters because coding benchmarks have been changing. Older tests often rewarded a model for producing a plausible function in one shot. SWE-bench asks for something harder: read a repository, understand an issue, edit the right files, and generate a patch that passes. The Verified version uses a human-filtered set of 500 instances and reports the share of problems the model truly resolves. (swebench.com) Qwen’s result is strong because it lands near the top tier on that repo-fix task while doing even better on terminal-heavy work. In Qwen’s own comparison table, Claude Opus 4.5 still leads on SWE-bench Verified at 80.9, but Qwen3.6-Plus edges ahead on Terminal-Bench 2.0, 61.6 to 59.3. That is a useful split. It suggests the model is not just good at writing patches in a controlled harness. It is also getting better at the messier sequence of shell commands, file operations, environment setup, and debugging steps that make up actual engineering work. (qwen.ai) Terminal-Bench makes that plain. Its sample tasks are not toy puzzles. One asks an agent to build a Linux kernel from source, modify startup code, package an initramfs, and boot the result in QEMU. Another asks it to configure a Git-backed web server so a pushed file appears over HTTP. These are the kinds of chores that break brittle agents fast. A model that can survive them is crossing from autocomplete into operations. (tbench.ai) Qwen is leaning into that framing. The company describes Qwen3.6-Plus as a model for “real world agents,” not just chat, and says the release is aimed at frontend development, repository-level problem solving, tool use, and long-horizon execution. It also ships with a default 1 million token context window, which is the sort of feature that matters more for codebases than for casual conversation. A model cannot fix what it cannot keep in view. (qwen.ai) The surrounding tooling shows the same ambition. Qwen maintains an open-source terminal agent called Qwen Code on GitHub, where the project describes itself as an AI agent that lives in the terminal and helps users understand large codebases and automate tedious work. That is the ecosystem this benchmark story belongs to. The point is no longer to impress developers with a neat snippet. The point is to stay inside the workflow long enough to finish the job. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.