Sakana AI trains conductor to manage LLMs
- Sakana AI researchers reported an ICLR 2026 paper on April 28 describing “The Conductor,” a 7-billion-parameter model trained to coordinate other LLMs. - The paper says the Conductor beat individual worker models on GPQA and LiveCodeBench, and can even call itself for recursive test-time scaling. - Sakana AI is already turning the approach into its Fugu API beta, a commercial orchestration product. (sakana.ai)
Large language models are the text engines behind tools like ChatGPT, but companies increasingly use several models at once because different systems are better at different jobs. Sakana AI says it trained a separate model to act like a manager for that mix. (openreview.net) The paper, “Learning to Orchestrate Agents in Natural Language with the Conductor,” was accepted as a poster at the International Conference on Learning Representations, or ICLR, 2026. The authors are Stefan Nielsen, Edoardo Cetin, Peter Schwendeman, Qi Sun, Jinglue Xu, and Yujin Tang. (openreview.net 1) (openreview.net 2) Their “Conductor” is a 7-billion-parameter language model trained with reinforcement learning, a trial-and-reward method, to decide which worker models to call, how they should talk to each other, and what instructions each should get. The paper says that means the system learns coordination strategies instead of relying on a fixed hand-written workflow. (openreview.net) (arxiv.org) In plain terms, the setup works like a team lead assigning work across specialists instead of one generalist doing everything alone. Sakana AI says the Conductor can build targeted communication topologies, meaning it chooses which models exchange information rather than broadcasting every prompt to every model. (openreview.net) The paper reports gains on reasoning benchmarks including GPQA and LiveCodeBench, and says the 7B Conductor outperformed any single worker model in its pool on those tests. OpenReview’s decision note also says reviewers highlighted state-of-the-art results on AIME and LiveCodeBench. (openreview.net 1) (openreview.net 2) One detail in the paper is that the Conductor can choose itself as a worker. The authors say that creates recursive topologies, a loop where the manager model re-enters the job, which they describe as a form of dynamic test-time scaling through online iterative adaptation. (openreview.net) The researchers also say they trained with randomized agent pools, so the Conductor learned to adapt to different combinations of open and closed models. That matters for companies that already use several providers and do not want a system tied to one vendor’s model lineup. (openreview.net 1) (openreview.net 2) Sakana AI is already packaging the research into a product. On April 24, the company opened early beta applications for Sakana Fugu, an API product it says is based on its ICLR 2026 Conductor and Trinity papers. (sakana.ai) In that product post, Sakana AI says Fugu is a small model that learns to call other LLMs and can also call itself, echoing the paper’s recursive design. The company positions it as a way to spare users from manually juggling multiple providers, prompts, and API keys. (sakana.ai) The pitch is straightforward: instead of asking users to handcraft a chain of prompts, train a smaller model to assemble the team and run the meeting. Sakana AI is now trying to turn that research idea into a commercial layer that sits on top of frontier models. (openreview.net) (sakana.ai)