ASI-EVOLVE outperforms human training

- Researchers from SII-GAIR posted ASI-Evolve, a March 31 paper and code release describing an agentic system that redesigns model architectures, data pipelines, and training algorithms. - In reported tests, ASI-Evolve found 105 state-of-the-art linear-attention architectures, beat DeltaNet by 0.97 points, and lifted one data-curation benchmark by 18 MMLU points. - The project targets expensive, long-horizon AI research loops now being automated by agents. (arxiv.org)

Training an artificial intelligence model usually means humans keep tweaking three things: the model design, the data it sees, and the rules used to update it. A new paper called ASI-Evolve says an agent can now search across all three. (arxiv.org) The paper was posted on March 31 by researchers including Weixian Xu, Tiantian Mi, Yixiu Liu and Pengfei Liu, with code released on GitHub by the GAIR-NLP team. They describe ASI-Evolve as a loop that learns from prior results, proposes a new experiment, runs it, and writes down what it learned for the next round. (arxiv.org) (github.com) That matters because most previous "AI for AI" systems worked on short, cheap tasks with fast feedback. The authors argue real model research is slower and messier, because each failed run can cost substantial compute and produce ambiguous results. (arxiv.org) ASI-Evolve tries to handle that by splitting the work among three software roles: a Researcher agent that proposes changes, an Engineer agent that executes them, and an Analyzer agent that turns outcomes into reusable lessons. It also keeps two memories, one for human prior knowledge and one for experiment history, so it does not restart from zero each round. (arxiv.org) (github.com) The team tested the system on neural architecture design, pretraining-data curation, and reinforcement-learning algorithm design. Those are three of the costliest parts of building frontier models, because they shape what the model is, what it learns from, and how it improves. (arxiv.org) In architecture search, the paper says ASI-Evolve discovered 105 state-of-the-art linear-attention architectures. Its best model beat DeltaNet by 0.97 points, which the authors say is nearly three times the gain from recent human-designed improvements. (arxiv.org) (github.com) In data curation, the system evolved a pipeline that selected cleaner pretraining data and improved average results by 3.96 points, including an 18-point gain on MMLU in the paper's reported setup. In reinforcement learning, the paper says it produced a new optimization method that scored 12.5 points higher than Group Relative Policy Optimization on AMC32. (arxiv.org) (github.com) The paper also includes a biomedical test in drug-target interaction, where the authors report a 6.94 AUROC gain for cold-start generalization. That result is meant to show the search loop can move beyond pure language-model tuning into other research domains. (arxiv.org) (github.com) The central claim is not that ASI-Evolve trains a model once and wins, but that it automates repeated trial-and-error over long research cycles. The tradeoff is visible in the paper itself: more automation can reduce manual tuning, but it also means running many more experiments to discover what works. (arxiv.org) For model labs, the immediate question is whether that extra search cost is cheaper than keeping expert teams in the loop for every tuning decision. ASI-Evolve's answer is a research prototype, not a product launch, but it pushes the training loop one step further away from manual engineering. (arxiv.org)

ASI-EVOLVE outperforms human training

Get your own daily briefing