Big model research wins
This week advanced models posted striking research results: OpenAI’s GPT‑5.4 Pro produced a concise, three‑page proof for a 60‑year‑old Erdős problem, and a model named GPT‑Rosalind reached human‑expert level on RNA prediction tasks (x.com) (x.com). Anthropic also pushed updates — publishing Claude Opus 4.7 with major reasoning‑benchmark gains and rolling out new auto‑routines for Claude Code — according to social posts tracking the releases (x.com) (x.com).
A new wave of AI research results landed this week, with OpenAI and Anthropic both claiming stronger performance on math, biology, and coding tasks. (openai.com 1) (openai.com 2) (anthropic.com) One result came from pure math. On April 13, a discussion thread on erdosproblems.com said GPT‑5.4 Pro produced a solution to Erdős problem #1196, and a follow-up post said the model reached the claimed proof in about 80 minutes and that the argument was short. (erdosproblems.com 1) (erdosproblems.com 2) That problem sits in number theory, the branch of math that studies whole numbers and their patterns. The thread says the model proved that for any primitive set of integers, a certain tail sum is at most 1 plus an error term that shrinks as x grows, tightening earlier partial results. (erdosproblems.com) A separate OpenAI release targeted biology rather than math. On April 16, the company introduced GPT‑Rosalind, a model built for life-sciences work such as literature review, sequence-to-function interpretation, experimental planning, and data analysis. (openai.com) RNA prediction is a pattern-matching problem in biology: researchers try to infer what a sequence will do, or design a new one that does a desired job. OpenAI said GPT‑Rosalind, tested with Dyno Therapeutics on unpublished RNA sequences, ranked above the 95th percentile of human experts on prediction and around the 84th percentile on sequence generation in best-of-ten submissions. (openai.com) (publicnow.com) OpenAI said GPT‑Rosalind is part of a specialized life-sciences series rather than a general chatbot. The company said the model is available as a research preview in ChatGPT, Codex, and the application programming interface for qualified customers in its trusted access program, alongside a Codex plugin that connects to more than 50 scientific tools and data sources. (openai.com) Anthropic’s update was aimed at software work. On April 16, it announced Claude Opus 4.7, saying the model improves on Opus 4.6 in advanced software engineering, especially on hard long-running tasks, and is now generally available across Claude products, its application programming interface, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. (anthropic.com) Anthropic also used the release to describe new safeguards and access controls. It said Opus 4.7 includes automatic blocking for prohibited or high-risk cybersecurity requests, while legitimate security users can apply to a new Cyber Verification Program. (anthropic.com) The company also pushed more autonomy into Claude Code. Anthropic’s documentation says its new “routines” feature lets users save a prompt, repositories, and connectors, then run that setup automatically on a schedule, from an application programming interface call, or from GitHub events on Anthropic-managed cloud infrastructure. (code.claude.com) Taken together, the week’s releases show model makers spreading in two directions at once: toward narrower scientific systems like GPT‑Rosalind and toward more autonomous coding systems like Claude Code routines, while still using headline math results to test how far general reasoning models can go. (openai.com) (code.claude.com) (erdosproblems.com)