AlphaGeometry 2 scores 83–88% IMO

- Google DeepMind’s AlphaGeometry 2 paper showed the system now solves 84% of IMO geometry problems from 2000–2024, surpassing average gold-medalist performance. - The key jump came from expanding formal geometry coverage from 66% to 88%, plus stronger search using Gemini and shared information across search trees. - It matters because formal math AI is shifting from flashy guesses to verified proofs, with AlphaProof and AG2 already reaching 2024 IMO silver level.

Math Olympiad geometry is one of those benchmarks that sounds niche until you realize what it tests. Not memorization. Not pattern matching. Actual chained reasoning, with constructions, invariants, and proofs that have to hold all the way through. That is why AlphaGeometry 2 matters. Google DeepMind’s new paper says the system now solves 84% of International Mathematical Olympiad geometry problems from 2000 through 2024 — enough to beat the average human gold medalist on that slice of the contest. (arxiv.org) ### What exactly is AlphaGeometry 2? It is a hybrid math system — part language model, part symbolic theorem prover. The language model proposes useful constructions, like adding a point or drawing a line. The symbolic engine then does the hard formal work of checking what follows logically. That basic recipe was already there in AlphaGeometry 1, but AlphaGeometry 2 is much broader and stronger. (deepmind.google)tem-for-geometry/)) ### Why is geometry such a big deal? Because Olympiad geometry is hostile to bluffing. A pure language model can sound convincing, but geometry problems usually need one clever move and then a long chain of exact deductions. If any step is wrong, the whole proof collapses. So this is a good stress test for whether an AI can really reason, not just imitate reasoning. That is also why formal systems like Lean matter so much in the parallel AlphaProof work. (nature.com) ### What changed from the first version? The biggest upgrade is coverage. AlphaGeometry 2 extends the formal language so it can represent more of the weird stuff that appears in real IMO problems — movements of objects, linear equations over angles and ratios, distances, and non-constructive setups. That pushed language coverage of IMO 2000–2024 geometry problems from 66% to 88%. In plain English: the system can now(nature.com)e solving starts. (arxiv.org) ### Where does the 83–88% claim come from? People are mixing two related numbers. The paper says 88% is the coverage rate of the formal language over IMO geometry problems from 2000–2024. The actual solving rate is 84% on all geometry problems over those 25 years. So “83–88%” is directionally pointing at the result, but the cleaner takeaway is 84% solved and 88% covered. (arxiv.org) ### How did it (arxiv.org)ring. First, the neural side got better — the paper says AlphaGeometry 2 uses Gemini architecture for stronger language modeling. Second, the search got smarter through a knowledge-sharing mechanism between search trees, which basically means different proof attempts can reuse useful discoveries instead of starting cold every time. Add improvements to the symbolic engine and(arxiv.org)4% to 84% starts to make sense. (arxiv.org) ### How does this connect to AlphaProof? AlphaGeometry 2 is the geometry specialist. AlphaProof is the broader formal prover for non-geometry Olympiad problems. In the 2024 IMO setup, AlphaProof solved three of the five non-geometry problems, including the hardest one, and together with AlphaGeometry 2 reached silver-medal level overall. The catch is that this was not contest-speed in the human sense — some problems took days, and the(arxiv.org)language first. (nature.com) ### So is this “AI can do olympiad math now”? Not quite. It means AI can now do a meaningful chunk of olympiad math in a formal, verifiable way when the domain is constrained enough and compute is generous enough. That is a real breakthrough — basically the opposite of chatbot-style handwaving — but it is not yet a general mathematician in a box. (nature.com) ##(nature.com)he best results here did not come from scaling a plain text model and hoping for the best. They came from mixing neural intuition with symbolic verification. For hard math, turns out that hybrid recipe is starting to look less like a workaround and more like the path. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.