Learning, Fast and Slow split-weights paper

- Rishabh Tiwari and co-authors posted “Learning, Fast and Slow: Towards LLMs That Adapt Continually” to arXiv on May 12, outlining Fast-Slow Training. (arxiv.org) - The paper says Fast-Slow Training is up to 3x more sample-efficient than parameter-only reinforcement learning and cuts KL drift by up to 70%. (arxiv.org) - The paper lists a video, blog and code alongside the arXiv entry, with authors from UC Berkeley, Mila, UT Austin, Eragon, Periodic Labs and Mirendil. (arxiv.org)

Rishabh Tiwari, Kusha Sareen and seven co-authors posted “Learning, Fast and Slow: Towards LLMs That Adapt Continually” to arXiv on May 12, according to the paper’s abstract page. The authors include researchers affiliated with UC Berkeley, Mila, UT Austin, Eragon, Periodic Labs and Mirendil, the paper says. (arxiv.org) Their method, called Fast-Slow Training, splits adaptation into two channels: persistent model parameters as “slow” weights and optimized textual context as “fast” weights. The paper says that setup aims to let models absorb task-specific information in context while keeping the underlying model closer to its base behavior. (arxiv.org) The arXiv abstract frames the work as a response to a trade-off in post-training. Parameter updates can improve downstream performance but can also push a model away from its base behavior, while in-context learning is cheaper and more reversible but usually weaker on its own, the authors wrote. Fast-Slow Training combines both rather than treating them as alternatives, according to the paper. ### What exactly are the “fast” and “slow” parts here? The paper says the slow component is the model’s parameters, which are expensive to update and persist across tasks. (arxiv.org) The fast component is textual context — prompts, instructions and other task context — which can be changed cheaply and quickly, according to the PDF. The authors describe those contextual updates as fast “weights” that can absorb task-specific information from textual feedback. The authors wrote that the goal is to separate long-lived behavior from short-lived task information. In their framing, general reasoning behavior should remain in slow weights, while transient or task-local information can move through optimized context. (arxiv.org) That description appears in both the abstract and the paper PDF. ### What results did the authors report? Fast-Slow Training was “up to 3x more sample-efficient” than slow-learning reinforcement learning alone across reasoning tasks, the abstract says. The paper also says the method reached a higher performance ceiling than either reinforcement learning alone or prompt optimization alone. (arxiv.org) In the same summary, the authors reported that Fast-Slow-trained models stayed closer to the base model, with “up to 70% less KL divergence.” Figure text reproduced in the HTML version names CodeIO, Math using Polaris, and HoVer-hard as benchmark settings used in the comparison. (arxiv.org) That figure caption says Fast-Slow Training reached reinforcement learning’s peak with fewer samples and that a checkpoint trained this way roughly matched the base model’s ability to learn a second task, while a reinforcement-learning-only checkpoint “barely learns the new task.” ### How does the paper describe the problem it is trying to solve? The introduction says standard supervised fine-tuning and reinforcement learning both write new information into the same persistent parameter set. (arxiv.org) The authors argue that this creates a bottleneck because reusable reasoning skills, task-specific heuristics and temporary lessons from recent training all compete for space in the same weights. The paper links that to catastrophic forgetting and to loss of plasticity, meaning weaker ability to adapt to later tasks. The abstract says that in continual-learning settings, where task domains change during training, Fast-Slow Training “continues to acquire each new task while parameter-only RL stalls.” That claim is presented as part of the paper’s central evidence for separating adaptation across two timescales. (arxiv.org) ### Who wrote it, and where can readers find the materials? The arXiv entry lists Rishabh Tiwari, Kusha Sareen, Lakshya A. Agrawal, Joseph E. Gonzalez, Matei Zaharia, Kurt Keutzer, Inderjit S. Dhillon, Rishabh Agarwal and Devvrit Khatri as authors. The PDF’s front matter also includes links labeled “Video,” “Blog,” and “Code,” indicating supporting materials beyond the paper itself. (arxiv.org) ArXiv shows the first submission date as May 12, 2026, under identifier 2605.12484. The listing appears in the machine learning category, with artificial intelligence as a secondary subject, according to arXiv’s recent submissions page. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.