ASI‑Arch runs 1,773 experiments
- ASI-Arch is a research agent that did something older AutoML systems usually cannot do — it searched for new model designs, not just better settings. - In one reported run, the system executed 1,773 experiments over more than 20,000 GPU hours and surfaced 106 linear-attention architectures that beat hand-designed baselines. - The bigger claim is that architecture discovery may now scale with compute — but the catch is reproducibility, cost, and whether these results generalize beyond linear attention.
Neural architecture search has been around for years. But most of it works inside a box humans drew first. You define the search space, the algorithm shuffles pieces around, and maybe it finds a better variant. ASI-Arch is claiming a bigger jump — that an AI system can propose the pieces too, write the code, run the training jobs, read the results, and keep iterating with minimal human steering. That is why this paper landed with so much noise. It is not just “we found a better model.” It is “we may be automating part of research itself.” ### What actually changed? The paper, posted to arXiv on July 24, 2025, presents ASI-Arch as a fully autonomous system for architecture discovery. The authors say it ran 1,773 experiments, used more than 20,000 GPU hours, and discovered 106 new state-of-the-art linear-attention architectures. The GitHub repo is public and includes the pipeline plus the 106 discovered designs, which makes this more concrete than a vague agent demo. ### Why is “no human-defined search space” the big deal? (arxiv.org) Because that is the ceiling on most older architecture search. Traditional NAS systems are good at optimization inside a menu humans already wrote. ASI-Arch is pitched as automated innovation instead — the system forms hypotheses about new architectural motifs, implements them as executable code, trains them, evaluates them, and uses prior results to decide what to try next. Basically, the claim is that the machine is not only choosing from Lego bricks. It is inventing new bricks while building. ### Why linear attention? Linear attention is a useful test bed because transformer attention is powerful but expensive, and researchers have spent years trying to make it cheaper without wrecking performance. That makes the space crowded, technical, and full of tradeoffs — exactly the kind of place where a system that can test huge numbers of odd ideas might find patterns people missed. The authors frame their wins specifically in this domain, not as a proof that all model design is now solved. (arxiv.org) ### What does the “AlphaGo moment” line mean? It is an analogy to Move 37 — the famous AlphaGo play that looked strange to humans and then turned out to be brilliant. The paper argues that some of ASI-Arch’s discovered designs have that flavor: unusual combinations that outperform familiar human baselines and reveal design principles researchers were not already using. That is a strong claim. But it is also a rhetorical one. It means “surprising and useful,” not “we have proven machine superintelligence.” (arxiv.org) ### Did the system just brute-force this with money? Partly, yes — and that matters. More than 20,000 GPU hours is not a toy run. The paper’s more ambitious claim is that discovery itself shows a scaling law, meaning more compute produced more breakthroughs in a fairly regular way. If that holds up, the implication is huge: architecture research starts to look less like artisanal insight and more like an engineering pipeline you can scale. But that also means the labs with the most compute get an even bigger edge. (arxiv.org) ### So should we trust it? Trust the result as an interesting research claim, not as settled fact. This is an arXiv paper, not a peer-reviewed journal result, and the headline numbers come from the authors’ own benchmark setup. The open repo helps. Releasing the discovered architectures helps. But reproducibility here is hard by definition — rerunning 1,773 experiments over 20,000 GPU hours is expensive, and small implementation details can matter a lot in model benchmarking. (arxiv.org) ### What is the real takeaway? The real story is not that ASI-Arch found 106 better linear-attention variants, though that is impressive. It is that the paper sketches a loop where AI tools generate ideas, test them, store what they learned, and keep searching without waiting for a human researcher to handcraft the next hypothesis. If that loop proves robust outside this one domain, the bottleneck in AI progress shifts again — away from human idea generation and toward compute, evaluation quality, and who can run the biggest autonomous research systems. (arxiv.org)