ASI‑Arch finds 106 attention architectures

- ASI-Arch, an autonomous research system from Shanghai Jiao Tong University and collaborators, was shown discovering 106 new state-of-the-art linear attention designs. - The run mattered because it was not a toy sweep: 1,773 autonomous experiments, more than 20,000 GPU hours, and open-sourced code plus architectures. - The bigger claim is that model discovery itself may scale with compute, not just model training.

Neural network architecture search usually means humans define the menu and software picks from it. ASI-Arch is trying something more aggressive — letting an AI system propose new ideas, write the code, run the experiments, judge the results, and loop again. That is the real news here. Not just 106 new attention variants, but a claim that architecture discovery itself can be turned into a compute-scaled process. ### What did the team actually build? ASI-Arch is a multi-agent research loop for linear attention models — the class of architectures meant to keep long-context modeling cheaper than standard Transformers. The system does the full cycle: it reads prior work and past experiments, proposes a new architecture, implements it, trains it, evaluates it, stores the result, and uses that history to decide what to try next. The paper was posted to arXiv on July 24, 2025, and the code plus discovered architectures are on GitHub. ### Why focus on linear attention? Because ordinary attention gets expensive fast as sequence length grows. Linear attention is one of the main attempts to fix that, but the design space is messy and still very open. People have hand-designed families like Mamba-style or DeltaNet-style sequence models, but there is no settled best recipe. That makes the area perfect for automated search — lots of plausible combinations, weak theory, and expensive trial-and-error. (arxiv.org) ### What changed in this run? The system carried out 1,773 autonomous experiments over more than 20,000 GPU hours and reported 106 state-of-the-art linear attention architectures. That is the headline number, but the more interesting part is the workflow: the search was not restricted to a tiny fixed catalog of modules. The authors frame this as a move from classic neural architecture search — which mostly optimizes inside human-defined boxes — to automated innovation, where the machine keeps generating new boxes too. (arxiv.org) ### Why call it an “AlphaGo moment”? The analogy is AlphaGo’s Move 37 — the kind of non-obvious move that looks strange to humans and then turns out to be strong. The authors are saying some of these discovered architectures work like that: they combine components in ways researchers did not already converge on, then outperform human-designed baselines. That does not mean the field has had its single decisive AlphaGo breakthrough. But it does mean the paper is making a bigger claim than “we ran a large hyperparameter sweep.” (arxiv.org) ### Is the 106 number the main point? Not really. The main point is the claimed scaling law. The paper says architectural breakthroughs rose with added compute, which suggests discovery itself might become more industrialized — more GPUs, more experiments, more chances to hit useful designs. Basically, the bottleneck shifts away from human researchers manually inventing every new block. If that pattern holds outside this one domain, AI research could speed up in a very recursive way. (arxiv.org) ### What is the catch? The catch is scope. These are linear attention architectures, not a replacement for all model design. And “state of the art” inside a narrow benchmark slice is not the same as proving broad practical superiority across cost, robustness, and deployment constraints. The paper is exciting because it shows a method that may generalize — but the result itself is still one domain-specific demonstration. (arxiv.org) ### Why does this matter beyond one paper? Because AI has mostly used more compute to train bigger models, not to automate the act of research itself. ASI-Arch points at a different loop — AI systems helping invent the next generation of AI systems. If that loop gets reliable, model progress stops depending so heavily on a small number of human architecture intuitions and starts looking more like a scalable search problem. That is the real reason people are paying attention. (arxiv.org)

ASI‑Arch finds 106 attention architectures

Get your own daily briefing