MIT reports Recursive Language Models

- On June 1, social-media posts resurfaced an MIT CSAIL paper describing Recursive Language Models, a framework that lets language models inspect prompts programmatically and recurse. (arxiv.org) - The paper says RLMs handled inputs up to two orders of magnitude beyond model context windows and reported a 28.3% average gain for RLM-Qwen3-8B. (arxiv.org) - The paper’s latest arXiv version is dated May 11, 2026, and the authors released code on GitHub, according to arXiv. (arxiv.org)

MIT CSAIL researchers Alex L. Zhang, Tim Kraska and Omar Khattab describe Recursive Language Models, or RLMs, as an inference-time framework that lets a language model treat a long prompt as an external environment rather than a single block of text inside a fixed context window. (arxiv.org) In the paper, posted on arXiv on Dec. 31, 2025 and revised on May 11, 2026, the authors say the model can write code to inspect a prompt, break it into parts and recursively call itself on selected snippets. A June 1 social-media post helped circulate the paper more widely, but the underlying work was already public on arXiv and had also been presented in a MIT CSAIL event listing for Feb. 25, 2026. ### How is this different from just giving a model a bigger context window? The arXiv paper says RLMs are aimed at “arbitrarily long prompts” by shifting part of the work from raw context length to inference-time computation. Instead of forcing the base model to read everything at once, the framework lets it generate programs that query, slice and revisit parts of the prompt as needed. The MIT CSAIL event page uses similar language, saying the system can “programmatically examine, decompose, and recursively call itself” over prompt snippets. That makes the claim narrower than a new foundation model architecture: the paper presents RLMs as a general inference strategy wrapped around an existing model. (arxiv.org) ### What evidence do the MIT authors say they found? The authors write on arXiv that RLMs processed inputs “up to two orders of magnitude beyond model context windows.” They also report that, on shorter prompts, the method outperformed what they call vanilla frontier LLMs and common long-context scaffolds across four long-context tasks at comparable cost. (arxiv.org) One of the paper’s more concrete model-level results involves a smaller system the authors call RLM-Qwen3-8B. The paper says that post-trained model beat the underlying Qwen3-8B by 28.3% on average and “approaches the quality of vanilla GPT-5” on three long-context tasks. (csail.mit.edu) ### What does “recursive” mean here in practice? The paper says recursion is not just a metaphor. The framework allows the model to issue sub-queries to itself over smaller pieces of the prompt, using generated code to decide what to inspect next and how to combine the results. (arxiv.org) Figure language quoted in the PDF summary says GPT-5 performance fell as input length and task complexity increased on benchmarks including S-NIAH and OOLONG, while the corresponding RLM maintained stronger performance and handled inputs beyond GPT-5’s listed 272,000-token context window. (arxiv.org) Those are the authors’ benchmark claims, not an independent evaluation. ### Is this a new model or a wrapper around existing models? The paper presents both. The main contribution is an inference framework that can sit on top of existing large language models, and the authors compare that setup with frontier systems and coding-based scaffolds. (arxiv.org) Separately, they say they post-trained what they describe as the first “natively recursive language model,” RLM-Qwen3-8B. That distinction matters because the work is framed as a way to extend long-context reasoning without waiting for ever-larger context windows alone. The authors say code for the project is publicly available, which gives outside researchers a path to test the claims. (arxiv.org) ### Where can readers track what happens next? ArXiv lists the latest version of “Recursive Language Models” as version 3, dated May 11, 2026, under identifier 2512.24601. MIT CSAIL’s event page identifies Alex Zhang as the speaker for the Feb. 25 presentation, and the arXiv record points readers to the project’s code repository for follow-up work and replication. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.