Google Finds Prompt Duplication Boosts Accuracy

Researchers at Google found that simply duplicating a prompt within the same context window can boost LLM accuracy by up to 76%. The technique reportedly works without increasing token count, latency, or requiring additional fine-tuning. The finding provides insight into how attention mechanisms in transformer models process input sequences.

- The finding comes from a Google Research paper titled "Prompt Repetition Improves Non-Reasoning LLMs" by Yaniv Leviathan, Matan Kalman, and Yossi Matias. - This technique was tested on seven models, including Gemini 2.0 Flash, GPT-4o, Claude 3.7 Sonnet, and DeepSeek V2, across benchmarks like ARC, OpenBookQA, and MMLU-Pro. - The method is most effective for non-reasoning tasks; when chain-of-thought prompting is enabled, the gains are minimal because the model's internal reasoning already involves restating or reprocessing the query. - The performance boost occurs during the parallelizable pre-fill stage, where the model processes the input. This allows each prompt token to attend to all other tokens a second time, mitigating the ordering effect inherent in causal language models. - On a custom "NameIndex" task requiring the model to identify the 25th name in a list of 50, accuracy for one model surged from 21.33% to 97.33% after applying prompt repetition. - The paper found prompt repetition won in 47 out of 70 benchmark-model combinations with zero significant losses when reasoning was turned off. - Researchers note that repeating the prompt twice is a reliable default, though a three-fold repetition can yield better results on certain long-context tasks at the cost of increased input tokens.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.