Google Overviews Are Flawed
Google’s AI “Overviews” can sound convincing but are wrong roughly one time in ten, which means they can deliver millions of wrong answers at Google’s scale. Analysts also found the cited sources don’t always support the summaries, making retrieval, provenance and confidence signalling brittle for product design. (popsci.com) (pcmag.com)
Google’s new search trick is simple: instead of giving you ten blue links, it often gives you one polished paragraph that sounds like it already did the reading for you. Google calls those paragraphs AI Overviews, and the company says they appear when its systems think generative artificial intelligence will be especially helpful. (google.com) That shortcut is the whole promise and the whole risk. Google’s own help page says AI Overviews “can and will make mistakes,” and its product document says the feature is supposed to appear only when Google has “high confidence” in the answer. (google.com) (googleusercontent.com) A New York Times analysis, using tests run with the artificial intelligence startup Oumi, found Google’s summaries were accurate about 91% of the time in February 2026. That still means roughly 1 in 10 answers contained at least one incorrect claim. (nytimes.com) (pcmag.com) At Google’s size, “1 in 10” stops sounding small. Google said in March 2025 that Search now handles more than 5 trillion searches a year, so even a modest error rate can turn into millions of bad answers in very little time. (searchengineland.com) (popsci.com) The deeper problem is not just wrong answers. The Times analysis found that in about half of the cases where a summary was factually correct, at least one cited link did not actually support the claim it was attached to. (pcmag.com) (nytimes.com) That breaks the oldest safety valve in search. If the paragraph is the waiter’s recommendation, the links are supposed to be the kitchen receipt, and a receipt that doesn’t match the meal makes it harder for a reader to check anything quickly. (googleusercontent.com) (pcmag.com) Oumi’s tests also suggested the sourcing problem got worse even as headline accuracy improved. PCMag reported that Oumi measured source-link errors at 37% with Gemini 2 in October 2025 and above 56% with Gemini 3 in February 2026. (pcmag.com) Some of those citations came from places like Facebook and Reddit, which helps explain why a neat-looking answer can still rest on shaky ground. The New York Times also described a British Broadcasting Corporation journalist creating a misleading page that Google’s system repeated within 24 hours. (pcmag.com) (nytimes.com) Google says the outside test is flawed. The company told reporters that the benchmark included bad information, that the queries did not reflect normal search behavior, and that results can vary from user to user. (pcmag.com) Even if Google is right about the benchmark, the product is already changing how people use the web. Pew Research Center found that 58% of U.S. adults in its March 2025 panel study saw at least one AI summary, and users were less likely to click links when a summary appeared. (pewresearch.org) That means the summary is not just an extra box on the page. It is increasingly the answer people leave with, which makes every unsupported citation and every confident mistake more important than the raw percentage first suggests. (pewresearch.org) (google.com)