Alibaba Qwen3.7-Max 35-hour run
- Alibaba’s Qwen team said on May 19 that Qwen3.7-Max completed a roughly 35-hour autonomous coding run without human intervention. - Alibaba said the run performed 432 kernel evaluations across 1,158 tool calls while writing, compiling, profiling and revising code on its own. - Qwen3.7-Max is available through Qwen and Alibaba Cloud channels, with the company pointing developers to Model Studio and Qwen Code.
Alibaba’s Qwen team said on May 19 that its new Qwen3.7-Max model completed a roughly 35-hour autonomous coding run, a claim that drew wider attention after an X post in the past 48 hours highlighted the experiment and its activity logs. Qwen said the model operated through more than 1,000 tool calls in a kernel-optimization task, without human intervention during the run. The company described the test as evidence that the model can sustain long-horizon work rather than just answer prompts one turn at a time. The X thread did not introduce new numbers, but it helped surface a result Alibaba had already published in its launch materials. ### Where did the 35-hour claim come from? Qwen published the claim in a May 19 research post titled “Qwen3.7: The Agent Frontier.” In that post, the team said Qwen3.7-Max “sustains coherent reasoning across extremely long horizons” and cited “a 35-hour, fully autonomous kernel optimization run comprising over 1,000 tool calls.” Alibaba Cloud repeated the same description in a May 21 community post summarizing the release. (qwen.ai) Alibaba Cloud gave the fuller accounting in that community post. It said that over about 35 hours of continuous autonomous execution, the model performed 432 kernel evaluations across 1,158 tool calls. The company said the model wrote, compiled, profiled and iteratively improved an “Extend Attention Kernel” on its own. ### What, exactly, did the model do during the run? (qwen.ai) Alibaba said the task was not a chat demo but a software-engineering loop. The company said Qwen3.7-Max diagnosed compilation failures, fixed correctness bugs and identified performance bottlenecks through profiling, then revised the code and tested again. The company also said the run ended with an average 10x speedup over a reference implementation. (alibabacloud.com) That figure appears in third-party coverage summarizing the launch, but the underlying description of the loop itself comes from Alibaba’s own materials. ### Why are people focusing on 432 tests and 1,158 tool calls? The 432-test and 1,158-call figures are the most concrete part of Alibaba’s account because they describe the mechanics of the run rather than a benchmark score. (alibabacloud.com) In practice, those numbers indicate repeated cycles of code generation, compilation, profiling and evaluation over a long period, instead of a single successful pass. The X post that circulated this weekend pointed to those same numbers and framed them as an example of an agent closing its own feedback loop with automated tests. Alibaba’s launch materials support that narrower factual point: the model was repeatedly testing and revising its own work inside an external tool harness. ### Was this Alibaba’s own agent system or a broader model claim? (alibabacloud.com) Qwen said Qwen3.7-Max “generalizes across agent scaffolds,” naming Claude Code, OpenClaw and Qwen Code among the frameworks where it said the model performed consistently. That wording matters because Alibaba is presenting the result as a model capability that can work across external harnesses, not only inside a single in-house demo environment. (alibabacloud.com) Alibaba’s product pages describe Qwen3.7-Max as a proprietary model built for “the agent era,” with coding, office workflow automation and long-horizon execution as the main use cases. The company says developers can access it through Qwen and Alibaba Cloud Model Studio, and Qwen Code is positioned as an open-source terminal agent optimized for Qwen models. ### What can be verified, and what remains a company claim? (qwen.ai) The dates, the 35-hour duration, the 432 kernel evaluations and the 1,158 tool calls can be verified from Alibaba’s own published release materials. The X thread appears to be amplifying those figures rather than reporting a separate experiment. The stronger conclusion — that this proves a step-change in autonomous agents — is not something Alibaba’s materials independently establish. (qwen.ai) What the company has documented is a single long-running kernel-optimization task, published on May 19 and reposted through Alibaba Cloud on May 21, with developers directed to Qwen3.7-Max in Model Studio and Qwen Code for follow-on testing. (qwen.ai)