Stanford Index shows rapid AI advances
The 2025 Stanford Index highlights an acceleration in AI capabilities, with systems now outperforming previous benchmarks in reasoning and multi-modal processing. New tests like MMMU, GPQA, and SWE-bench are being used to measure these advanced systems. The report suggests AI is moving from narrow proficiency toward more generalizable intelligence.
A surge in corporate adoption is fueling record AI investment, with 78% of organizations reporting AI use in 2024, a significant increase from 55% the previous year. This boom is led by the U.S., which saw $109.1 billion in private AI investment in 2024, dwarfing China's $9.3 billion and the U.K.'s $4.5 billion. The development of cutting-edge AI is increasingly concentrated in the private sector, with nearly 90% of notable new models in 2024 originating from industry, a jump from 60% in 2023. While academia still leads in highly cited AI research, the immense computational resources required for training are becoming a barrier, as the computing power needed for top models now doubles approximately every five months. The competitive landscape for top-tier AI is intensifying. The performance difference between the number one and the tenth-ranked AI model has shrunk from 11.9% to 5.4% in just a year. This suggests that access to high-performing AI is becoming more widespread among a growing number of developers. The new benchmarks are designed to test more sophisticated AI capabilities. For instance, GPQA uses "Google-proof" graduate-level questions, while SWE-bench evaluates the ability to solve real-world software engineering problems from GitHub. The dramatic performance gains on these tests in a single year highlight the rapid advancement in AI's reasoning abilities. Making advanced AI more accessible, the cost of AI inference has plummeted. For a model performing at the level of GPT-3.5, the cost dropped over 280-fold between late 2022 and late 2024. This cost reduction is driven by more efficient smaller models and hardware improvements. The gap between proprietary "closed-weight" models and their "open-weight" counterparts is rapidly closing. On some benchmarks, the performance difference narrowed from 8% to just 1.7% over the course of a year, democratizing access to powerful AI technology.