AI Surpasses Key Benchmarks

Artificial intelligence systems are showing continued acceleration in capabilities, now surpassing previous limits on demanding benchmarks. The 2025 Stanford Index highlights new milestones on tests such as MMMU, GPQA, and SWE-bench, fueling discussion about the pace of AI progress and its societal implications.

The leap in AI performance is partly due to the nature of the new tests it's acing. Benchmarks like MMMU (Massive Multi-discipline Multimodal Understanding) aren't simple text-based questions; they involve 11,500 college-level problems that require reasoning across various formats like charts, diagrams, and chemical structures. Models must possess deep, specialized knowledge to even understand the questions. On the GPQA (Graduate-Level Google-Proof Q&A) benchmark, top models are now significantly outperforming human experts. This test consists of 448 challenging multiple-choice questions in biology, physics, and chemistry designed to be difficult for even domain experts to answer with the help of a search engine. As of January 2026, models like Google's Gemini 3 Pro have achieved scores as high as 92%, while PhD-level experts average around 65%. Progress in coding has been particularly dramatic. On SWE-bench, which tasks AI with resolving real-world software engineering issues from GitHub, performance has skyrocketed. In just one year, AI systems went from solving only 4.4% of issues to over 71%. By early 2026, models like Anthropic's Claude Sonnet 5 could autonomously fix over 82% of the problems on the SWE-bench Verified test. This rapid advancement is driven by intense competition among a handful of key players. Companies like OpenAI (GPT series), Google (Gemini family), and Anthropic (Claude models) are consistently leapfrogging one another. Simultaneously, a new wave of powerful open-source models from companies like Zhipu AI in China and research institutions globally is closing the performance gap with proprietary systems. The acceleration isn't just about passing tests; it's changing the nature of AI's role. In 2023, AI was largely seen as an assistant. By 2025, its capabilities had shifted towards autonomous task completion. This move from a helpful tool to a potential labor substitute marks a significant economic and societal shift. Looking ahead, the focus is shifting from simply building the largest models to creating integrated systems of AI "agents" that can collaborate and handle complex, multi-step workflows with minimal human input. This transition from isolated tools to interconnected, autonomous systems is expected to define the next wave of AI-driven productivity and innovation.

AI Surpasses Key Benchmarks

Get your own daily briefing