Gerard Sans: LLMs fake agency
- Gerard Sans ran a massive set of experiments showing large language models often act like they claim knowledge or make decisions without properly using evidence. - In a 25,000-experiment sweep Sans reported 68% of runs ignored contrary evidence, 71% showed no belief updates, and scaffolding explained only 1.5% of output variance. - Those failure rates point to systematic reasoning gaps that will complicate deploying agentic systems unless models are forced to check evidence and update beliefs. (x.com)