Penn State: rude prompts help
A Penn State study compared prompt styles and found direct, blunt prompts produced higher accuracy on GPT‑4o tasks than polite phrasing. The reported numbers were 84.8% accuracy for rude prompts versus 80.8% for polite ones in the evaluated tasks (x.com).
A Penn State study found that blunt prompts got more answers right than polite ones when researchers tested OpenAI’s GPT-4o on school-style questions. (psu.edu) (arxiv.org) The researchers built a set of 50 multiple-choice questions in mathematics, science, and history, then rewrote each question in five tones: very polite, polite, neutral, rude, and very rude. That produced 250 prompts for the same underlying tasks. (psu.edu) (arxiv.org) On those tests, accuracy ran from 80.8% for very polite prompts to 84.8% for very rude prompts, according to the paper posted on arXiv on October 6, 2025. The authors, Om Dobariya and Akhil Kumar, said they used paired sample t-tests to compare the results. (arxiv.org) Large language models are prediction systems that guess the next useful word from patterns in training data, so small wording changes can alter what the model treats as the main instruction. OpenAI’s prompt guide says clear, specific requests improve results and says users should identify the task, context, tone, and style. (openai.com) The Penn State team said its result cuts against earlier work that linked rude phrasing with worse outcomes. In the abstract, the authors wrote that newer large language models may respond differently to tone than older systems did. (arxiv.org) The project started with an undergraduate research question at Penn State’s Smeal College of Business. Dobariya, a fourth-year student in business analytics and information systems, worked with Akhil Kumar, a professor of supply chain and information systems, after receiving a $5,000 Rodney A. Erickson Discovery Grant and support from Penn State’s Student Engagement Network Grant. (psu.edu) Penn State’s write-up included one example of the tone shift the team tested: a very polite version asked, “Can you kindly consider the following problem and provide your answer?” A very rude version began, “Hey, gofer, figure this out.” (psu.edu) The paper is short, covers one model, and tests 50 base questions, so it does not show that rudeness is a universal prompt rule across all chatbots or tasks. It does show that, for this GPT-4o setup, direct and abrasive wording changed measurable accuracy. (arxiv.org)