PaperOrchestra released
Google researchers published 'PaperOrchestra,' a multi‑agent framework that automates parts of AI research paper writing by coordinating agents for reasoning and tool use. (x.com) The paper demonstrates how specialist agents and orchestration patterns can be applied to complex, multi‑step knowledge tasks. (x.com)
Large language models can draft prose, but turning rough lab notes into a conference paper usually means weeks of outlining, citation hunting, figure making, and revision. Google Cloud AI Research researchers posted PaperOrchestra on April 6, a system built to automate that writing stage. (arxiv.org) The paper lists Yiwen Song, Yale Song, Tomas Pfister, and Jinsung Yoon as authors. It says PaperOrchestra takes “unconstrained pre-writing materials” and turns them into submission-ready LaTeX manuscripts with literature reviews, plots, and conceptual diagrams. (arxiv.org) The system splits the job across five specialist agents instead of asking one model to do everything in one pass. An Outline Agent plans the paper, a Plotting Agent makes visuals, a Literature Review Agent searches for papers, a Section Writing Agent drafts the manuscript, and a Content Refinement Agent revises it in review loops. (yiwen-song.github.io) That literature step targets a common failure in automated writing: invented citations. The project page says the review agent runs targeted web searches and checks paper existence and relevance through the Semantic Scholar application programming interface before building a citation graph. (yiwen-song.github.io) Google’s team tested the system on PaperWritingBench, a new benchmark built from 200 papers, with 100 each from the 2025 Conference on Computer Vision and Pattern Recognition and the 2025 International Conference on Learning Representations. The benchmark strips those papers back into idea summaries, experimental logs, templates, and conference guidelines so the test focuses on writing rather than running experiments. (yiwen-song.github.io) In side-by-side human evaluations, the authors report absolute win-rate margins of 50% to 68% in literature review quality and 14% to 38% in overall manuscript quality against autonomous baselines. The project page says the generated drafts were rendered directly in venue-specific formats, including the double-column Conference on Computer Vision and Pattern Recognition layout and the single-column International Conference on Learning Representations format. (arxiv.org) (yiwen-song.github.io) Google also published a separate April 8 blog post on two related academic-workflow agents: PaperVizAgent for figures and ScholarPeer for automated review. Third-party summaries of the PaperOrchestra paper say ScholarPeer-style review was used to score drafts, with simulated acceptance rates of 84% on Conference on Computer Vision and Pattern Recognition and 81% on International Conference on Learning Representations. (research.google) (marktechpost.com) The release lands as artificial intelligence labs push “agent” systems that divide long tasks into smaller jobs with checking steps between them. PaperOrchestra applies that pattern to one narrow part of research work: writing up finished results from notes and tables instead of generating the science itself. (arxiv.org 1) (arxiv.org 2) That distinction runs through the paper’s setup. The authors argue earlier autonomous writers were tied to their own experimental pipelines, while PaperOrchestra is meant for researchers who already have results and need help turning them into a polished draft. (arxiv.org) For now, the public release is a paper, a project page, and example manuscripts rather than a general product. The immediate test is whether researchers treat it as a drafting assistant for tedious paper assembly, or as an early sign that more of the academic workflow is moving into coordinated software agents. (arxiv.org) (yiwen-song.github.io)