ASI‑Evolve’s leap
A new system called ASI‑Evolve has been reported to automate self‑improvement research and find architectures humans missed — it discovered more than 100 neural architectures and reportedly beat human designs by roughly threefold. (x.com) The posts also say it improved data pipelines and invented better reinforcement‑learning routines, which matters because automated model R&D can materially shorten experiment cycles and shift where teams invest compute and talent. (x.com)
A paper posted to arXiv on March 31 describes a system called ASI-Evolve that tries to automate one of the hardest parts of AI work: not using models, but improving how models themselves are built. The authors say the system can run a full research loop on its own. It reads prior work, proposes changes, launches experiments, analyzes results, and feeds those lessons back into the next round of search. That is the real claim here. Not that it solved one benchmark, but that it turned model R&D into something closer to an iterative machine process (arxiv.org, github.com). The team behind it comes from the Generative Artificial Intelligence Research Lab, or GAIR, which says it is part of the Shanghai Innovation Institute and works jointly with Shanghai Jiao Tong University. The code was released publicly on GitHub alongside the paper, which matters because this is not just a teaser thread on X. There is an actual framework, with experiments and a paper trail, even if the headline results have not yet been peer reviewed (github.com, github.com, arxiv.org). What makes the paper notable is the scope. Most automated research systems go after narrow problems with fast feedback. ASI-Evolve goes after three slow, expensive ones that sit near the center of modern AI development: neural architecture design, pretraining data curation, and reinforcement-learning algorithm design. The paper says the system is built around a learn-design-experiment-analyze loop, with a “cognition base” that injects human priors and an analyzer that turns messy results into reusable lessons for later rounds. In other words, it is not just generating ideas. It is trying to remember what it learned (arxiv.org). The flashiest result is in architecture search. The authors say ASI-Evolve found 105 state-of-the-art linear-attention architectures. Its best one beat DeltaNet, a recent human-designed linear-attention model, by 0.97 points on the paper’s evaluation setup. The authors frame that as nearly three times the gain achieved by recent human improvements. That is where the “threefold” claim comes from. It does not mean the whole model is three times better. It means the improvement margin over the prior design was about three times larger than the margin from recent human-led advances in that line of work (arxiv.org, arxiv.org). That distinction matters because the rest of the paper is less about one dramatic leap than about repeated, practical gains. In data curation, the authors say the evolved pretraining pipeline improved average benchmark performance by 3.96 points, with gains above 18 points on MMLU. In reinforcement learning, the discovered algorithms beat GRPO on several reasoning benchmarks, including gains of 12.5 points on AMC32 and 11.67 on AIME24. Those are large deltas for systems that are supposed to be tuning the plumbing rather than replacing the model family outright (arxiv.org, arxiv.org). The deeper story is that ASI-Evolve treats AI progress as a search problem over code, data pipelines, and training rules. That idea is not new. Evolutionary search has been creeping back into AI research for years, and newer agent systems have been pushing toward longer research loops. What is new here is the attempt to unify those threads into one framework and aim it directly at frontier model development, where experiments are costly and feedback is noisy. The paper even includes early transfer experiments beyond the AI stack, in mathematics and biomedicine, as a way of arguing that the method is not tied to one lab trick (arxiv.org, sakana.ai). The paper does not prove that autonomous AI research has arrived. It does show something more concrete. A public system, released in late March, appears to have produced a long list of new model architectures, improved a pretraining pipeline, and invented reinforcement-learning variants that beat a strong baseline. The repository was updated within the last day, and the README still pitches the same idea in plain terms: let the AI run the research loop, while the human keeps the insight (github.com, github.com).