Chinese models discover 106 architectures
- Researchers from Shanghai Jiao Tong University and collaborators posted ASI-Arch on arXiv in July 2025, describing an autonomous system for neural architecture discovery. - The paper says ASI-Arch ran 1,773 experiments over 20,000 GPU hours and produced 106 state-of-the-art linear attention architectures across benchmarks. - The work extends neural architecture search beyond human-set design spaces. (arxiv.org)
Neural architecture is the wiring diagram of an artificial intelligence model: which parts pass information, in what order, and with what shortcuts. Most modern systems still start from layouts that humans choose first. (arxiv.org) A July 24, 2025 paper from Yixiu Liu, Yang Nan, Weixian Xu, Xiangkun Hu, Lyumanshan Ye, Zhen Qin and Pengfei Liu says their system, ASI-Arch, can generate and test those layouts on its own. The paper was posted to arXiv under the title “AlphaGo Moment for Model Architecture Discovery.” (arxiv.org) The team frames ASI-Arch as a shift from classic neural architecture search, where researchers define the menu and software only picks from it. Their claim is that the model now proposes new menu items too, then writes code, trains models and checks results experimentally. (arxiv.org) (github.com) In this paper, the target was linear attention, a family of attention methods built to cut the memory and compute costs of standard transformers on long sequences. That makes the search narrower than “all neural networks,” but still focused on a core bottleneck in large-model design. (arxiv.org) The headline number is 1,773 autonomous experiments over more than 20,000 GPU hours. The authors say that process produced 106 “innovative” linear attention architectures that reached state-of-the-art results on their benchmarks. (arxiv.org) (github.com) The GitHub repository says the group open-sourced the pipeline, database components and all 106 discovered architectures. It also says the system uses a multi-agent setup and a knowledge base of past experiments and paper summaries to guide later rounds. (github.com) The authors also claim a scaling law for discovery itself: more compute led to more architecture findings. In plain terms, they argue that research progress in this setting rose with machine time, not only with extra human design work. (arxiv.org) The paper is an arXiv preprint, which means it is publicly posted but not peer reviewed by arXiv itself. arXiv describes the site as an open-access archive and says material there is not peer-reviewed by the platform. (arxiv.org 1) (arxiv.org 2) Shanghai Jiao Tong University appears repeatedly in author and lab pages tied to the project, including Pengfei Liu’s faculty profile and the GAIR lab materials. The repository has been public for months and showed about 1,200 GitHub stars when checked. (sjtu.edu.cn) (plms.ai) (github.com) What happens next is more ordinary than the paper’s title: other researchers will need to reproduce the benchmarks, test the 106 designs in different settings and see whether the gains hold outside linear attention. Until then, the clearest verified result is that one team built an automated system that generated, coded and evaluated model architectures at scale. (arxiv.org) (github.com)