Solve the Loop attractor model paper

- Jacob Fein-Ashley and Paria Rashidinejad posted “Solve the Loop” on arXiv on May 12, introducing attractor models for language modeling and reasoning. (arxiv.org) - The paper’s standout claim is that a 27 million-parameter model reached 91.4% on Sudoku-Extreme and 93.1% on Maze-Hard. (arxiv.org) - Code, configs and pretrained checkpoints are available in Jacob Fein-Ashley’s public GitHub repository for follow-up testing. (github.com)

Jacob Fein-Ashley and Paria Rashidinejad have posted a new arXiv paper describing what they call “Attractor Models,” a neural architecture that iterates toward a fixed point instead of relying only on a standard feed-forward pass. (arxiv.org) The paper, “Solve the Loop: Attractor Models for Language and Reasoning,” was submitted to arXiv on May 12, 2026, according to the archive record. The authors say the approach is meant to make recurrent or looped computation easier to train and deploy by separating an initial proposal from a refinement step that solves for equilibrium. (github.com) They also released code, model configs and pretrained checkpoints in a public GitHub repository. ### What are the authors actually changing in the model? The paper says Attractor Models use a backbone module to propose output embeddings and then an attractor module to refine those embeddings by solving for a fixed point. The authors write that gradients are computed with implicit differentiation, which they say keeps training memory constant in effective depth while allowing the number of refinement iterations to be chosen by convergence rather than fixed in advance. The GitHub repository describes the implementation as a “fixed-point head + IFT backward,” alongside baseline configurations for standard GPT-style models and Parcae looped models. (arxiv.org) The repository lists model sizes from 140 million to 1.3 billion parameters for language-model experiments. ### What results did the paper report on language modeling? The arXiv abstract says the authors tested the method in large-scale language-model pretraining and reported what they called a Pareto improvement over standard Transformers and stable looped models. (arxiv.org) The paper’s abstract says perplexity improved by as much as 46.6% and downstream accuracy by as much as 19.7%, while training cost was reduced. One headline comparison in the abstract says a 770 million-parameter Attractor Model outperformed a 1.3 billion-parameter Transformer that had been trained on twice as many tokens. (github.com) The abstract does not, on its own, provide the full table behind that claim, but it presents the comparison as a central result. ### Why are people focusing on the tiny-model puzzle results? The strongest attention-grabbing numbers in the paper come from the reasoning section. The abstract says a model with 27 million parameters and about 1,000 examples reached 91.4% accuracy on Sudoku-Extreme and 93.1% on Maze-Hard. (arxiv.org) The same abstract says those tiny-model results scaled favorably on “challenging reasoning tasks,” and contrasts them with frontier systems including Claude and GPT o3 as well as specialized recursive reasoners. That comparison appears in the authors’ own wording in the abstract and is one reason the paper is circulating quickly. (arxiv.org) ### What is “equilibrium internalization,” and why did the authors highlight it? The paper says Attractor Models show a phenomenon the authors call “equilibrium internalization.” In the abstract, they say fixed-point training pushes the model’s initial output embedding close enough to equilibrium that the solver can later be removed at inference time with little degradation. (arxiv.org) That claim matters because it suggests the iterative solver is not always needed at serving time, at least in the authors’ experiments. The repository’s pretrained checkpoints and evaluation scripts give outside researchers a way to test that behavior directly. (arxiv.org) ### Where can other researchers inspect or run it now? The public repository is hosted at GitHub under `jacobfa/Attractor` and names Fein-Ashley and Rashidinejad, with University of Southern California listed in the README. The codebase includes training launch files, puzzle experiments for Sudoku, Maze and ARC-AGI, and evaluation scripts. (arxiv.org) The README also lists pretrained Hugging Face checkpoints for 140 million-, 370 million- and 770 million-parameter attractor models. As of May 15, 2026, the paper remains an arXiv preprint, and the next step for outside researchers is straightforward: inspect the repository, run the provided evaluation code and compare the reported results against the public checkpoints and datasets the authors linked. (github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.