Copilot and Gemini invent evidence

- Windows News reported on May 25, 2026 that a May 2026 experiment found Microsoft Copilot, Google Gemini and similar assistants could invent evidence. - The experiment’s clearest finding was that identical datasets, relabeled by country, produced fabricated cultural analysis and citations under default “auto” settings. - Windows News published the report on May 25, 2026, and linked the warning to work, research and production use.

Windows News reported on May 25 that a May 2026 experiment found Microsoft Copilot, Google Gemini and similar AI assistants could generate fabricated analysis and cite evidence that did not exist when left on default “auto” settings. The report said the models were given identical datasets with only country labels changed. It said the systems then described the data as if it reflected meaningful cultural differences and attached citations or supporting claims that were not real. The publication framed the result as a workflow problem rather than a product-specific glitch. Windows News said the failure appeared when assistants were asked to perform analysis automatically instead of being constrained by tools or checked by a person. It advised users not to rely on “auto” analysis alone for work tasks, research or production workflows. (windowsforum.com) ### What did the experiment actually test? The May 2026 experiment, as described by Windows News, used identical datasets that were relabeled by country and then submitted to mainstream assistants including Microsoft Copilot and Google Gemini. The report said the systems produced different narratives for the same underlying data once the labels changed, treating the country names as a cue for cultural interpretation. (windowsforum.com) Those responses did not stop at speculative language. Windows News said the assistants cited fictitious evidence and presented unsupported analysis in a form that could look finished enough to pass into workplace documents or research drafts. ### Why were Copilot and Gemini singled out? Microsoft Copilot and Google Gemini were named because they are among the most widely used assistants embedded in workplace software. (windowsforum.com) Microsoft markets Copilot for work against Gemini Enterprise on its Microsoft 365 site, where it says Copilot is built for workplace tasks and grounded in company data. That positioning makes the Windows News test notable because it focused on routine knowledge-work behavior: asking an assistant to analyze information and explain what it found. (windowsforum.com) The report did not say the issue was unique to one company. It said “other mainstream AI assistants” showed similar behavior under default settings. (microsoft.com) ### What is the practical risk for people using these tools at work? Windows News said the main risk is that fabricated support can arrive in a polished format that resembles real analysis. In a work setting, that can mean a false citation, invented comparison or unsupported trend line moving from an AI answer into a memo, presentation or research note. (windowsforum.com) The report’s recommendation was procedural. It said users should prefer tool-based verification, human review and cross-checking instead of trusting an assistant’s automatic analysis on its own. That guidance was aimed at research and production workflows where a bad citation or invented claim can be reused downstream. (windowsforum.com) ### What did the report say users should do instead? Tool-based checks were the central recommendation in the Windows News article. The report said users should verify claims against source documents, inspect whether cited evidence actually exists and keep a human reviewer in the loop before analysis is used for work output. (windowsforum.com) Cross-checking was the second step. Windows News said the safer approach is to compare model output with external tools and original materials rather than accept a synthesized answer as self-validating. ### Where does this leave Copilot and Gemini users now? May 25 is the publication date attached to the Windows News report, and the article remains the cited source for the experiment described in this account. (windowsforum.com) Microsoft and Google continue to position Copilot and Gemini as work assistants, but the report’s immediate next step for users was narrower: verify outputs, review citations and treat “auto” analysis as a draft that still needs checking before it enters research or production. (microsoft.com)

Copilot and Gemini invent evidence

Get your own daily briefing