X thread analyzes three Nature AI papers
- X user Saikiranchandha posted a thread on May 22-23 analyzing three recent Nature papers on AI systems for scientific discovery and research automation. - Nature papers said Google DeepMind’s Co-Scientist generated and refined hypotheses, while ERA found 40 single-cell methods that beat top human approaches. - Nature and related releases published the papers this week; Google DeepMind and FutureHouse have linked project pages and supporting materials.
X user Saikiranchandha used a May 22-23 thread to tie together three recent Nature papers on AI systems for science: Google DeepMind’s Co-Scientist, DeepMind’s ERA, and FutureHouse’s Robin. The thread’s core claim was that these systems are pushing more of research into workflows that look like search, ranking and iterative optimization. The papers support much of that framing, while also drawing a line between automated research steps and the human work of choosing goals, checking results and deciding what to trust. ### Which three papers was the thread pointing to? Nature published “Accelerating scientific discovery with Co-Scientist” four days ago, describing a Google DeepMind multi-agent system built to generate, debate and refine scientific hypotheses. The paper says the system is intended as a general-purpose tool for scientific discovery, with initial validation focused on biomedicine. Nature also published “An AI system to help scientists write expert-level empirical code,” which presents ERA, short for Empirical Research Assistant. (x.com) The paper says ERA automates the design and refinement of scientific software and reports that, in bioinformatics, it discovered 40 new methods for single-cell data analysis that outperformed leading human-developed methods on a public leaderboard. A third Nature paper, “A multi-agent system for automating scientific discovery,” describes FutureHouse’s Robin. (nature.com) Nature and Nature’s press materials say Robin combines specialized agents that can search literature, propose experiments, interpret results and revise hypotheses. ### What does Co-Scientist actually do? Google DeepMind said Co-Scientist is a multi-agent system built with Gemini 2.0 that “iteratively generates, debates, and evolves novel hypotheses for complex scientific problems.” Nature’s summary says the system is designed to augment how scientists generate and test ideas rather than replace experimental validation. (nature.com) Nature’s press material said Co-Scientist and Robin both use multiple specialized agents across the research process. (nature.com) That setup lets the systems generate hypotheses, suggest experiments, interpret findings and refine their proposals based on results, according to the journal’s summary. ### Why did the single-cell result stand out? ERA’s most concrete benchmark came in single-cell analysis. (deepmind.google) Nature’s paper summary says the system discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. Harvard’s School of Engineering and Applied Sciences, describing the same work, said ERA also generated COVID-19 hospitalization models that beat the best U.S. (natureasia.com) Centers for Disease Control and Prevention models used during the pandemic, and found new methods for integrating single-cell RNA-sequencing datasets. That release said the system automates a software design-and-refinement cycle that can otherwise take months or years. (nature.com) ### How close are these systems to running science end to end? FutureHouse’s Robin paper says the system automates the “key intellectual steps” of the scientific process in a semi-autonomous workflow. In a related announcement, FutureHouse said Robin proposed hypotheses, experiment choices, data analyses and manuscript figures for work on dry age-related macular degeneration, while human researchers carried out the physical lab experiments. (seas.harvard.edu) Nature’s coverage of Co-Scientist and Robin says both systems show how agent-based AI can streamline research, but the published descriptions still leave experimental execution and real-world validation with human scientists. Saikiranchandha’s thread made the same distinction, arguing that people remain responsible for setting goals, validating outputs and deciding when evidence is strong enough to trust. (arxiv.org) ### So what is the practical takeaway from the thread? The three papers describe different layers of the same stack. Co-Scientist focuses on literature-grounded hypothesis generation and critique; Robin links literature search, experiment planning and analysis; ERA targets the coding and benchmarking bottleneck in empirical work. Nature published the Co-Scientist, Robin and ERA papers this week, and the supporting project pages from Google DeepMind, FutureHouse and Harvard provide the clearest next materials for readers who want the methods, validations and benchmark details behind the X thread. (nature.com) (deepmind.google) (nature.com)