Qwen3.6 matches top reverse engineering

- A YouTube benchmark this week showed Alibaba’s Qwen3.6-35B-A3B completing an LTE modem reverse-engineering task while its creator compared results against Gemma 4 and Claude Sonnet. - Qwen’s official April release says the model uses 35 billion parameters with 3 billion active, scoring 73.4 on SWE-bench Verified and 51.5 on Terminal-Bench 2.0. - The result adds to pressure on open models in security work and code auditing. (qwen.ai)

Reverse engineering means taking a compiled binary — machine code with the source removed — and working backward to figure out what it does. That is the skill tested in a new YouTube benchmark built around an LTE modem crawler task. (youtube.com) In the video, the creator says Qwen3.6-35B-A3B solved the challenge and frames the result as a head-to-head against Google’s Gemma 4 and Anthropic’s Claude Sonnet. The test centers on building a working crawler from modem code rather than answering textbook questions about assembly. (youtube.com) Alibaba’s Qwen team released Qwen3.6-35B-A3B in mid-April 2026 as an open-weight mixture-of-experts model with 35 billion total parameters and 3 billion active parameters. The company says the model is available on Qwen Studio, through its application programming interface, and as downloadable weights. (qwen.ai) (github.com) A mixture-of-experts model is like a workshop that keeps many specialists on staff but only calls a few to the bench for each job. That design lets Qwen claim bigger-model behavior while running with far less active compute per token. (qwen.ai) (alibabacloud.com) Alibaba’s own benchmark table places Qwen3.6-35B-A3B at 73.4 on SWE-bench Verified, 51.5 on Terminal-Bench 2.0, and 68.7 on Claw-Eval average. In the same table, Gemma 4-31B posts 52.0 on SWE-bench Verified and 42.9 on Terminal-Bench 2.0. (qwen.ai) (alibabacloud.com) Google introduced Gemma 4 on March 31, 2026, with 31B dense and 26B A4B variants under Apache 2.0. Anthropic introduced Claude Sonnet 4.6 on February 17, 2026, calling it its most capable Sonnet model and giving it a 1 million-token context window in beta. (ai.google.dev) (blog.google) (anthropic.com) The YouTube result is not a controlled academic benchmark, and the video page does not publish a full methods section, prompt log, or reproducibility package. That makes the comparison useful as a field test, but weaker than a standardized evaluation with fixed scaffolding and public runs. (youtube.com) The practical appeal is clear: firmware triage, driver inspection, and malware analysis often start with stripped binaries that humans must decode function by function. A model that can map those binaries into understandable logic can shorten the first pass for security teams and researchers. (youtube.com) (qwen.ai) What changed in April is that an open-weight model is now being presented as competitive in a task usually associated with top closed coding systems. Qwen’s own release notes already cast the model as rivaling larger dense models such as Gemma 4-31B, and the video pushes that claim into a reverse-engineering workflow. (qwen.ai) (youtube.com) The video’s headline is narrower than “AI can reverse engineer anything,” but broader than a toy demo. It shows how quickly code models are moving from autocomplete into reading binaries that were never meant to be read by people. (youtube.com)

Qwen3.6 matches top reverse engineering

Get your own daily briefing