LLM Feedback Found to Improve Scientific Peer Review

A large-scale randomized controlled trial at the ICLR 2025 conference found that providing human reviewers with LLM-generated feedback significantly enhanced the quality and constructiveness of their peer reviews. The study, published in *Nature Machine Intelligence*, involved over 20,000 reviews and demonstrated a practical application for AI in improving a core scientific process.

- The study's "Review Feedback Agent" was developed by a team of researchers including Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, and James Zou. The agent is publicly available on GitHub for others to use and build upon. - In the randomized trial, 22,467 reviews were selected to receive LLM feedback, while a control group of 22,364 did not. Of the reviewers who received the optional AI suggestions, 26.6% chose to update their reviews, incorporating over 12,000 specific feedback points. - The AI feedback led to tangible changes in review quality; reviewers who updated their submissions increased the length of their reviews by an average of 80 words, indicating greater detail. - The system was designed to provide feedback on vague comments, content misunderstandings, and unprofessional remarks. To ensure the quality of its suggestions, the agent used a series of automated "reliability tests" as guardrails, with over 96% of the generated feedback passing these checks. - The use of AI in peer review is a growing trend; a December 2025 global survey found that 53% of researchers already use AI tools when evaluating manuscripts. However, a separate study found that between 7% and 17% of reviews at AI conferences in 2023 and 2024 showed signs of substantial LLM modification. - While the ICLR study showed positive results, the broader application of LLMs in peer review faces challenges. Studies have found that while AI-generated reviews can be structurally sound, they often lack the depth, critical insight, and contextual awareness of human experts. - Some in the research community have raised specific critiques of the ICLR 2025 study, pointing to the fact that nearly three-quarters of reviewers did not update their review after receiving feedback and questioning whether increased word count is a reliable proxy for review quality. - The sentiment among researchers regarding AI's role in peer review remains divided. A 2025 survey by IOP Publishing found that while 41% believe generative AI could have a positive impact on the process, 37% view it negatively.

LLM Feedback Found to Improve Scientific Peer Review

Get your own daily briefing