LLM Feedback Found to Improve Scientific Peer Review

Published February 25, 2026 by The Daily Scout

A large-scale randomized controlled trial at the ICLR 2025 conference found that providing human reviewers with LLM-generated feedback significantly enhanced the quality and constructiveness of their peer reviews. The study, published in *Nature Machine Intelligence*, involved over 20,000 reviews and demonstrated a practical application for AI in improving a core scientific process.

Why it matters

- The study's "Review Feedback Agent" was developed by a team of researchers including Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, and James Zou. The agent is publicly available on GitHub for others to use and build upon. - In the randomized trial, 22,467 reviews were selected to receive LLM feedback, while a control group of 22,364 did not. Of the reviewers who received the optional AI suggestions, 26.6% chose to update their reviews, incorporating over 12,000 specific feedback points. - The AI feedback led to tangible changes in review quality; reviewers who updated their submissions increased the length of their reviews by an average of 80 words, indicating greater detail. - The system was designed to provide feedback on vague comments, content misunderstandings, and unprofessional remarks. To ensure the quality of its suggestions, the agent used a series of automated "reliability tests" as guardrails, with over 96% of the generated feedback passing these checks. - The use of AI in peer review is a growing trend; a December 2025 global survey found that 53% of researchers already use AI tools when evaluating manuscripts. However, a separate study found that between 7% and 17% of reviews at AI conferences in 2023 and 2024 showed signs of substantial LLM modification. - While the ICLR study showed positive results, the broader application of LLMs in peer review faces challenges. Studies have found that while AI-generated reviews can be structurally sound, they often lack the depth, critical insight, and contextual awareness of human experts. - Some in the research community have raised specific critiques of the ICLR 2025 study, pointing to the fact that nearly three-quarters of reviewers did not update their review after receiving feedback and questioning whether increased word count is a reliable proxy for review quality. - The sentiment among researchers regarding AI's role in peer review remains divided. A 2025 survey by IOP Publishing found that while 41% believe generative AI could have a positive impact on the process, 37% view it negatively.

Key numbers

A large-scale randomized controlled trial at the ICLR 2025 conference found that providing human reviewers with LLM-generated feedback significantly enhanced the quality and constructiveness of their peer reviews.
The study, published in *Nature Machine Intelligence*, involved over 20,000 reviews and demonstrated a practical application for AI in improving a core scientific process.
In the randomized trial, 22,467 reviews were selected to receive LLM feedback, while a control group of 22,364 did not.
Of the reviewers who received the optional AI suggestions, 26.6% chose to update their reviews, incorporating over 12,000 specific feedback points.

What happens next

A 2025 survey by IOP Publishing found that while 41% believe generative AI could have a positive impact on the process, 37% view it negatively.

Sources

Quick answers

What happened in LLM Feedback Found to Improve Scientific Peer Review?

Why does LLM Feedback Found to Improve Scientific Peer Review matter?

The study's "Review Feedback Agent" was developed by a team of researchers including Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, and James Zou. The agent is publicly available on GitHub for others to use and build upon. In the randomized trial, 22,467 reviews were selected to receive LLM feedback, while a control group of 22,364 did not. Of the reviewers who received the optional AI suggestions, 26.6% chose to update their reviews, incorporating over 12,000 specific feedback points. The AI feedback led to tangible changes in review quality; reviewers who updated their submissions increased the length of their reviews by an average of 80 words, indicating greater detail. The system was designed to provide feedback on vague comments, content misunderstandings, and unprofessional remarks. To ensure the quality of its suggestions, the agent used a series of automated "reliability tests" as guardrails, with over 96% of the generated feedback passing these checks. The use of AI in peer review is a growing trend; a December 2025 global survey found that 53% of researchers already use AI tools when evaluating manuscripts. However, a separate study found that between 7% and 17% of reviews at AI conferences in 2023 and 2024 showed signs of substantial LLM modification. While the ICLR study showed positive results, the broader application of LLMs in peer review faces challenges. Studies have found that while AI-generated reviews can be structurally sound, they often lack the depth, critical insight, and contextual awareness of human experts. Some in the research community have raised specific critiques of the ICLR 2025 study, pointing to the fact that nearly three-quarters of reviewers did not update their review after receiving feedback and questioning whether increased word count is a reliable proxy for review quality. The sentiment among researchers regarding AI's role in peer review remains divided. A 2025 survey by IOP Publishing found that while 41% believe generative AI could have a positive impact on the process, 37% view it negatively.