AI capabilities show rapid improvement

The 2025 Stanford AI Index signals continued and rapid improvements in artificial intelligence capabilities. New benchmarks have demonstrated significant gains in AI systems' reasoning, general knowledge, and software engineering skills. The report also highlights the proliferation of open-source AI tools, which are democratizing access but also introducing new security and ethical challenges.

The rapid improvement in AI is not just incremental; on new, more complex benchmarks, AI systems have demonstrated dramatic leaps in capability. For instance, on the SWE-bench, which tests the ability to resolve real-world software engineering issues, AI systems' success rate jumped from just 4.4% in 2023 to 71.7% in 2024. Similarly, on the GPQA benchmark, designed with graduate-level questions that are difficult to find online, scores soared by 48.9 percentage points within a year. This progress is overwhelmingly driven by industry labs, which produced nearly 90% of notable AI models in 2024, a significant increase from 60% the previous year. While academic institutions remain the primary source of highly cited AI research, the immense computational power required for training frontier models has concentrated development within large tech companies. However, the competitive landscape is tightening, with the performance gap between the top-ranked and 10th-ranked models narrowing from 11.9% to 5.4% in just one year. The economic implications of these advancements are beginning to take shape, with corporate AI investment reaching $252.3 billion in 2024. One forecast suggests AI could boost global economic output by up to 15 percentage points over the next decade. Still, the immediate impact on GDP growth in 2025 was "basically zero," according to Goldman Sachs' chief economist, who noted much of the investment is spent overseas on manufacturing and infrastructure. The democratization of AI through open-source tools has been a double-edged sword, introducing significant security vulnerabilities. In 2025, researchers discovered malware hidden in AI models on the popular open-source repository Hugging Face. Additionally, critical vulnerabilities have been found in major open-source AI frameworks from companies like Meta and NVIDIA, which could allow for remote code execution. These incidents highlight the growing risks of AI supply chain poisoning. Ironically, the very benchmarks used to measure this rapid progress are facing increased scrutiny. Critics argue that many benchmarks suffer from data contamination, where the test questions have been included in the models' training data, essentially allowing them to "teach to the test." This has led to a culture of "SOTA-chasing" (State-of-the-Art chasing), where high scores are valued over genuine, replicable insights into a model's capabilities. As existing benchmarks become saturated, researchers are developing even more challenging evaluations to push the boundaries of AI. These new tests, such as FrontierMath and BigCodeBench, present complex problems where the best AI systems currently solve only a small fraction, indicating significant room for future growth and innovation. The cost to train and operate these powerful models is also evolving. While the computational power needed for training doubles every five months, the cost to actually use a model (inference) has plummeted. For a system performing at the level of GPT-3.5, inference costs dropped by a factor of over 280 between late 2022 and late 2024, making advanced AI more accessible than ever.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.