GPT‑5.4’s surprising proof

A recent social post claims GPT‑5.4 Pro produced a three‑page proof resolving a 60‑year‑old Erdős math problem, putting advanced model reasoning back in the headlines (x.com). The same stream also points to sibling model releases like GPT‑5.4‑Cyber for defense and Anthropic’s Claude Opus 4.7 hitting new reasoning benchmarks, suggesting multiple vendors are pushing specialized high‑reasoning builds in parallel (x.com).

A social-media claim that GPT‑5.4 Pro produced a proof for Erdős Problem #1196 is now being checked in public by mathematicians, not just reposted by AI fans. The problem page and forum thread show active discussion of a proposed solution posted this week. (erdosproblems.com 1) (erdosproblems.com 2) The model at the center of the claim is real. OpenAI introduced GPT‑5.4 on March 5, 2026, said GPT‑5.4 Pro is its higher-performance version for complex tasks, and gave both models a roughly 1.05 million-token context window in the API. (openai.com) (developers.openai.com) The math problem is also real and older than the post. Erdős Problem #1196 asks about “primitive sets,” meaning sets of whole numbers where no number divides another, and the site lists it as a conjecture of Paul Erdős, András Sárközy, and Endre Szemerédi from 1968. (erdosproblems.com) (forbes.com) In plain terms, the question studies how large a weighted sum over those non-dividing numbers can be when every number in the set is at least x. Jared Duker Lichtman’s 2023 paper settled the original Erdős primitive set conjecture, and the newer #1196 problem asks for a sharper asymptotic bound in the large-number regime. (cambridge.org) (erdosproblems.com) What changed this week is that the forum discussion moved from an open problem to a candidate proof built around a Markov-chain argument, which is a probability process that steps from one number to another by fixed rules. A research note summarizing the thread says participants, including Terence Tao, Jared Duker Lichtman, Will Sawin, and Kevin Barreto, reformulated the proof into a cleaner argument after the initial post. (erdosproblems.com) (ulam.ai) The strongest public caution is that verification is still the story. The Erdős Problems site says its status reflects the site owner’s current belief and tells readers to do their own literature search, while coverage in The Decoder said formal verification was still underway as of April 15. (erdosproblems.com) (the-decoder.com) There is already a formalization effort. A GitHub repository created this week says it is formalizing a solution to Erdős Problem #1196 in Lean, a proof assistant that checks each logical step the way a compiler checks code, and states the quantitative bound as \(1 + O(1/\log x)\). (github.com) (erdosproblems.com) The claim landed as model makers were already rolling out more specialized reasoning systems. OpenAI said on April 14 that it was expanding Trusted Access for Cyber and introducing GPT‑5.4‑Cyber, a variant fine-tuned for defensive cybersecurity use by vetted users. (openai.com 1) (openai.com 2) Anthropic made a parallel move one day later. On April 16, 2026, Anthropic announced Claude Opus 4.7, said it was generally available, and described gains on advanced software engineering and long-running tasks over Opus 4.6. (anthropic.com 1) (anthropic.com 2) The open question is no longer whether companies are selling “reasoning” as a product category. The open question is whether this specific three-page proof survives expert checking and joins the math literature as a result people cite, not just a post people share. (erdosproblems.com) (the-decoder.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.