Google Research Finds Duplicating Prompts Boosts Accuracy

A new paper from Google Research reveals that simply duplicating a prompt within the same input can improve LLM accuracy by up to 76%. This technique reportedly requires no extra tokens, adds no latency, and does not need fine-tuning. The findings offer new insights into how models process and weigh information within their context window.

- The paper, titled "Prompt Repetition Improves Non-Reasoning LLMs," was authored by Google researchers Yaniv Leviathan, Matan Kalman, and Yossi Matias. Yossi Matias is the head of Google Research and has a history of leading initiatives that bridge the gap between research and practical applications, including founding Google's "Google for Startups Accelerator." - This technique provides a "second pass" for the model to process the information, which is beneficial because of the causal nature of most LLMs (they process tokens in sequence). By the time the model sees the duplicated prompt, it has already processed the entire context once, allowing for better connections between different parts of the input. - The most significant improvements were observed in tasks that are sensitive to the position of information within the prompt. For example, in a test where the model had to find a specific name in a long list (the NameIndex task), the accuracy of the Gemini 2.0 Flash-Lite model jumped from 21.33% to 97.33% with prompt duplication. - While the technique was tested and found effective across models from Google (Gemini), OpenAI (GPT), Anthropic (Claude), and DeepSeek, the researchers noted a minor caveat. For very long prompts, Anthropic's Claude models did show some increase in latency, likely due to the longer time needed for the initial processing of the doubled input. - For a startup, this method represents a cost-effective way to boost accuracy without the need for expensive model fine-tuning or implementing more complex prompt engineering strategies like chain-of-thought, which increases token count and latency. It can be implemented as a simple, default setting for many existing LLM applications. - This discovery is part of a broader landscape of prompt engineering techniques that startups can leverage to improve AI-driven products. Other methods include providing specific examples (few-shot prompting), assigning a role to the AI, and breaking down complex tasks into a sequence of simpler prompts (prompt chaining). - The research focused on "non-reasoning" tasks. When chain-of-thought or other step-by-step reasoning instructions are already part of the prompt, the benefit of simple duplication is less pronounced, as the reasoning process itself often involves a form of internal re-evaluation of the prompt. - Author Yaniv Leviathan is a Google Fellow whose work focuses on making LLMs more efficient. He also invented "speculative decoding," a now widely used technique for speeding up LLM generation, and led the Google Duplex project, an AI system for human-like voice conversations.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.