Stanford Index shows rapid AI advances

The 2025 Stanford Index highlights an acceleration in AI capabilities, with systems now outperforming previous benchmarks in reasoning and multi-modal processing. New tests like MMMU, GPQA, and SWE-bench are being used to measure these advanced systems. The report suggests AI is moving from narrow proficiency toward more generalizable intelligence.

A surge in corporate adoption is fueling record AI investment, with 78% of organizations reporting AI use in 2024, a significant increase from 55% the previous year. This boom is led by the U.S., which saw $109.1 billion in private AI investment in 2024, dwarfing China's $9.3 billion and the U.K.'s $4.5 billion. The development of cutting-edge AI is increasingly concentrated in the private sector, with nearly 90% of notable new models in 2024 originating from industry, a jump from 60% in 2023. While academia still leads in highly cited AI research, the immense computational resources required for training are becoming a barrier, as the computing power needed for top models now doubles approximately every five months. The competitive landscape for top-tier AI is intensifying. The performance difference between the number one and the tenth-ranked AI model has shrunk from 11.9% to 5.4% in just a year. This suggests that access to high-performing AI is becoming more widespread among a growing number of developers. The new benchmarks are designed to test more sophisticated AI capabilities. For instance, GPQA uses "Google-proof" graduate-level questions, while SWE-bench evaluates the ability to solve real-world software engineering problems from GitHub. The dramatic performance gains on these tests in a single year highlight the rapid advancement in AI's reasoning abilities. Making advanced AI more accessible, the cost of AI inference has plummeted. For a model performing at the level of GPT-3.5, the cost dropped over 280-fold between late 2022 and late 2024. This cost reduction is driven by more efficient smaller models and hardware improvements. The gap between proprietary "closed-weight" models and their "open-weight" counterparts is rapidly closing. On some benchmarks, the performance difference narrowed from 8% to just 1.7% over the course of a year, democratizing access to powerful AI technology.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.