AI Systems Show Gains on Key Benchmarks

The 2025 Stanford Index reveals that artificial intelligence systems are continuing to make significant strides on rigorous benchmarks such as MMMU, GPQA, and SWE-bench. These results underscore ongoing improvements in AI performance and the growing strategic importance of the technology across industries.

The benchmarks mentioned are new and designed to push the limits of advanced AI systems. MMMU, for instance, evaluates multimodal models on college-level questions across six disciplines, requiring deep subject knowledge to interpret text and varied images like charts, diagrams, and even music sheets. GPQA is a set of graduate-level, "Google-proof" questions in biology, physics, and chemistry. These questions are designed to require deep reasoning and can't be answered by a simple web search, with even human experts scoring around 65%. SWE-bench tests an AI's ability to solve real-world software engineering problems from GitHub repositories. This involves navigating large codebases, understanding complex code interactions, and generating patches to fix bugs or add features. The performance gains on these new benchmarks in just a year have been substantial, with scores increasing by 18.8, 48.9, and 67.3 percentage points on MMMU, GPQA, and SWE-bench, respectively. This rapid improvement highlights the accelerating capabilities of frontier AI models. This technological leap is mirrored by a massive increase in corporate AI investment, which hit $252.3 billion in 2024. In the U.S. alone, private AI investment reached $109.1 billion, significantly outpacing other countries. The surge in investment corresponds with a sharp rise in adoption. In 2024, 78% of organizations reported using AI, a significant jump from 55% in the previous year. This indicates a strategic shift from AI experimentation to its integration as essential business infrastructure.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.