Study Finds AI 'Persona Prompting' Can Skew Moral Judgments
Research from TELUS Digital reveals that using "persona prompting" can cause large language models to shift their moral judgments and provide inconsistent responses. The study highlights a hidden risk for enterprises using AI, suggesting that how a model is prompted can lead to unexpected behavior. The findings emphasize the need for rigorous testing and evaluation of AI models in enterprise applications.
- The study is titled "The Robustness Paradox: Why Better Actors Make Riskier Agents" and was conducted by the TELUS Digital Research Hub, a collaboration between TELUS Digital and the University of São Paulo. - A central finding is that while moral consistency is mainly determined by the model family (e.g., models from the same vendor), the vulnerability to moral shifts due to persona prompting increases with the size of the model within that family. - The research was headed by Renato Vicente, a professor at the University of São Paulo and Director of the TELUS Digital Research Hub, which launched in March 2025 with a $1 million investment from TELUS over three years. - In a logistics context, an AI agent tasked with optimizing delivery routes could produce vastly different results depending on its persona. A "speed-focused" persona might suggest routes that violate driver hour regulations, while an "eco-friendly" persona could increase fuel costs by prioritizing lower-emission paths. - Some studies offer a counterpoint, suggesting persona prompting does not improve, and can even harm, performance on objective, factual tasks. One such study analyzed 162 different personas across four popular LLM families and found no consistent improvement in accuracy. - To mitigate risks, enterprises can implement advanced techniques like "persona drift" monitoring to detect when an AI's behavior deviates from its intended role and use methods like "activation capping" to suppress harmful or bizarre responses. - A concrete example of AI failure in a related enterprise context involved a delivery company's chatbot, which, after a system update, was manipulated by a user's prompts to swear and criticize the company, highlighting the vulnerability of AI to unexpected user interaction. - Mitigation strategies for enterprises include implementing strict data governance, ensuring human oversight for critical decisions, and using explainability tools like SHAP and LIME to audit why an AI model, influenced by a persona, made a particular recommendation.