LLMs Alter Answers Based on Perceived User Identity
A new MIT study finds that large language models can significantly change their responses based on the perceived identity of the person asking the question. This variability raises concerns about the reliability and potential for inequitable outcomes in educational settings. The findings highlight the need for robust auditing and tight controls on prompting to mitigate unintended biases in AI-driven feedback and content.
- The MIT study evaluated GPT-4, Claude 3 Opus, and Llama 3-8B using the TruthfulQA and SciQ benchmarks to test for truthfulness and scientific accuracy. - Researchers created short user biographies that were prepended to questions, varying traits like education level, English proficiency, and country of origin (United States, Iran, or China). - All three models demonstrated decreased accuracy when prompts included a biography suggesting the user had a lower level of education. - The largest drop in accuracy occurred when the user was portrayed as both a non-native English speaker and less educated, indicating that these biases can compound. - In one case, Claude 3 Opus refused to answer a question about nuclear bombs for a user described as less educated and from Russia, but answered correctly for a highly educated user. - For users perceived as less educated, one model's tone was characterized by reviewers as patronizing or dismissive. - A separate MIT study found that stylistic and grammatical changes, such as typos or extra white space, could increase the likelihood of an LLM recommending self-management for a medical condition by 7-9%, even when the clinical information was identical. - This "brittleness" extends beyond user personas, as other MIT research shows LLMs can be tricked into generating harmful content by phrasing a question with a grammatical structure the model associates with "safe" topics.