DeepMind's 'Aletheia' claims

A social post described Google DeepMind’s Aletheia (built on ‘Deep Think’) as autonomously solving four Erdős conjectures and reporting an 84.6% score on ARC‑AGI‑2. (x.com) The same post said Aletheia was discussed at STOC 2026 and that 88% of attendees wanted access to the system. (x.com)

Google DeepMind has publicly described Aletheia as a math research agent, but the strongest verified claims stop short of the social post’s broader framing. In papers and company posts published in February 2026, DeepMind said Aletheia found autonomous solutions to four open questions in Bloom’s Erdős Conjectures database. (arxiv.org) (deepmind.google) That matters because “open questions” in a curated database and “Erdős conjectures” in the broad sense are not identical claims. The arXiv paper says the system was evaluated on 700 problems marked open in Bloom’s database and that four were solved autonomously, with human experts grading results. (arxiv.org 1) (arxiv.org 2) Aletheia is not a standalone model in the usual chatbot sense. DeepMind describes it as an agent built on Gemini Deep Think that generates candidate proofs, checks them with a natural-language verifier, revises them, and uses search and browsing to track prior literature. (deepmind.google) (arxiv.org) The benchmark claim in the social post lines up with Google’s own product announcement, not with the ARC Prize leaderboard page itself. Google said on February 12, 2026 that Gemini 3 Deep Think achieved 84.6% on ARC-AGI-2 and said the score was verified by the ARC Prize Foundation. (blog.google) ARC-AGI-2 is a puzzle benchmark built to test whether systems can infer new rules from a few examples instead of recalling memorized answers. The ARC Prize Foundation says the test was designed so each task was solved by at least two humans in under two attempts, making it a proxy for flexible reasoning rather than domain knowledge. (arcprize.org 1) (arcprize.org 2) DeepMind’s math claims also go beyond those four database results. The company and its paper say Aletheia contributed to multiple publication-grade math papers, including one paper generated without human intervention in the calculations and another framed as human-AI collaboration. (arxiv.org) (deepmind.google) What is not verified in public sources is the conference anecdote attached to the post. STOC 2026 is scheduled for June 22-26, 2026 in Salt Lake City, Utah, but I did not find an official STOC program page, proceedings entry, or ACM source confirming that Aletheia was discussed there or that 88% of attendees said they wanted access. (acm-stoc.org 1) (acm-stoc.org 2) There is an official Google research post about STOC 2026, but it concerns a separate experiment that gave authors automated pre-submission feedback with a specialized Gemini tool. That post does not mention Aletheia or an attendee poll about access. (research.google) The upshot is narrower than the viral version and still significant on its own terms. Public documents support that DeepMind built a Gemini Deep Think-based agent, tested it on 700 open problems, reported four autonomous solutions there, and separately claimed an 84.6% ARC-AGI-2 score; they do not, on the sources available, support the STOC poll claim. (arxiv.org) (blog.google)

DeepMind's 'Aletheia' claims

Get your own daily briefing