Google's five-agent PaperOrchestra

Google showed a multi‑agent pipeline that can turn lab notes into a submission‑ready LaTeX paper in about 40 minutes, splitting the job across five specialist agents rather than one generalist. The setup separates planner, drafter, critic, formatter and verifier roles so failures can be isolated and handoffs instrumented for a multi‑stage workflow. That pattern reinforces specialist decomposition as a practical architecture for complex tasks, not just an academic demo. (nerdleveltech.com)

Most research papers are not hard because of the ideas. They are hard because one result table has to become an abstract, a methods section, a figure caption, a bibliography entry, and finally a LaTeX file that a conference site will accept. (arxiv.org) Large language models are the text engines behind tools like chatbots, and a single model usually tries to do every step itself. Google’s new PaperOrchestra splits that job into separate workers, the way a newsroom splits reporting, editing, fact-checking, and layout. (research.google) Google Cloud AI Research posted the PaperOrchestra paper to arXiv on April 6, 2026. The system takes rough idea summaries and raw experiment logs, then turns them into a submission-ready paper written in LaTeX, the typesetting language most computer science conferences require. (arxiv.org) LaTeX is the code-like format academics use when they need equations, references, and figures to land in exactly the right place. It is closer to compiling a program than typing into a word processor, which is why formatting mistakes can derail an otherwise finished draft. (arxiv.org) PaperOrchestra uses five specialist agents instead of one generalist. Reporting on the paper says those roles cover planning, drafting, critique, formatting, and verification, so each handoff can be checked instead of hidden inside one giant prompt. (decrypt.co) That division is not just a style choice. Google Research wrote in January 2026 that agent systems can improve performance on parallelizable tasks but can also get worse on sequential ones, which makes workflow design more important than simply adding more agents. (research.google) To test the writing system, the authors built PaperWritingBench from 200 top-tier artificial intelligence conference papers. They reverse-engineered the kind of raw materials a researcher would have before writing, then asked PaperOrchestra and other automated systems to turn those materials back into full papers. (arxiv.org) In side-by-side human evaluations, PaperOrchestra beat autonomous baselines by 50 percent to 68 percent on literature review quality. It also led by 14 percent to 38 percent on overall manuscript quality, which is a smaller gap but covers the whole paper rather than one section. (arxiv.org) Outside summaries of the paper say the full run averages about 39.6 minutes and roughly 60 to 70 large language model calls per manuscript. That makes the system look less like a magic “write my paper” button and more like an automated production line with a lot of checkpoints. (tamiltech.in) The important part is what Google chose to automate. PaperOrchestra assumes the human already did the experiments and found the result, and the software handles the conversion from messy notes into the polished package that reviewers actually read. (arxiv.org) That makes this less about robot scientists and more about specialist decomposition. Google’s own recent agent research has been pushing on prompts, topologies, and role design, and PaperOrchestra is a concrete example of that theory becoming a usable workflow. (research.google) The catch is that a cleaner paper is not the same thing as a truer paper. PaperOrchestra can verify citations and generate visuals from inputs, but the benchmark still measures writing quality on reconstructed materials, not whether the underlying science deserves acceptance. (arxiv.org)

Google's five-agent PaperOrchestra

Get your own daily briefing