Study Finds LLMs Can Improve Peer Review
A large-scale randomized study conducted for the NeurIPS 2026 conference found that providing reviewers with LLM-generated feedback can improve the quality and politeness of academic peer reviews. The research, published in Nature Machine Intelligence, also noted that this intervention may introduce subtle biases. The findings highlight the growing role of AI in shaping how technical communities evaluate and communicate research.
- The use of AI in peer review is already significant; one 2025 analysis of 70,000 reviews for the ICLR conference found that approximately 21% were fully generated by an LLM. Another study estimated that up to 16.9% of reviews at top AI conferences showed signs of substantial modification by a large language model. - Conferences like NeurIPS are grappling with a massive increase in submissions, which grew over 220% from 9,467 in 2020 to 21,575 in 2025. This "submission tsunami" has strained the traditional peer review system, leading to reviewer fatigue and creating an environment where LLM assistance becomes more tempting. - The biases mentioned in the study are a documented concern; research has shown that LLM-assisted reviews can exhibit affiliation bias that favors authors from highly-ranked institutions and may also introduce subtle gender preferences. Other analyses suggest LLM-assisted reviews tend to give higher ratings to lower-quality papers, a form of leniency that could distort review outcomes. - While LLMs can improve the language and structure of reviews, they often lack the deep technical understanding to replace human experts. Studies have found that while LLM-generated feedback can be rated as helpful, it can also be superficial, fail to provide constructive criticism, and miss critical flaws in methodology. - The problem of AI-generated content extends to submissions themselves, creating new challenges for reviewers. One investigation in early 2026 found over 100 "hallucinated" or fake citations, created by LLMs, in papers that had already been accepted to the NeurIPS 2025 conference. - In response to these challenges, major conferences are updating their policies, though enforcement remains difficult. NeurIPS, for instance, has considered new policies for responsible reviewing, while also dealing with issues like "collusion rings" and fake reviewer accounts that predate the widespread use of LLMs. - Within FAANG and other tech companies, LLMs are being integrated into internal review and development processes, aligning with the trend of AI-assisted evaluation. Use cases include AI-assisted DevOps, automating contract analysis, summarizing customer feedback for product development, and providing internal knowledge management systems to engineering teams. An engineer at a FAANG company noted that while LLM integration in their IDE is a slight net positive for autocompleting repetitive code, it can also hallucinate plausible but incorrect code, undermining some productivity gains.