Software Gains Drive AI Efficiency More Than Compute

Research from Epoch AI suggests that software progress contributes to an approximate 10x improvement in AI efficiency per year. This outpaces gains from hardware and compute alone, explaining how models become more capable without corresponding leaps in processing power. These software-driven gains come from algorithmic improvements, data curation, and post-training techniques.

Epoch AI's research quantifies this software-driven efficiency, finding the amount of compute required to reach a specific performance level in language models has halved roughly every 8 months. This rate of algorithmic progress significantly outpaces the 2-year doubling time associated with Moore's Law for hardware. However, the same analysis suggests that between 60-95% of performance gains in large models still stem from scaling compute and training data, with new algorithms responsible for the remaining 5-40%. This highlights a dynamic where algorithmic breakthroughs make the massive investment in compute more effective, rather than replacing the need for it. Inference, where models are actually used, is where many software gains are realized. Techniques like quantization, pruning, and knowledge distillation are critical for cost optimization. Combining these methods has been shown to reduce inference costs by as much as 5x while retaining over 98% of the original model's accuracy. This efficiency race is also playing out in silicon. While NVIDIA has demonstrated a 45,000x improvement in energy efficiency for AI inference over the last eight years, hyperscalers are pushing further with custom chips. Google's purpose-built TPUs can be 15-30 times more energy-efficient than GPUs for specific AI tasks, a key factor in the "build vs. buy" calculus for datacenters. The strategic decision to build custom ASICs is exemplified by Google's upcoming seventh-generation "Ironwood" TPU, which is projected to more than double the performance-per-watt of its predecessor. For companies like Google, Amazon, and Microsoft, the massive upfront cost of designing custom chips is justified by long-term operational savings and performance tailored to their highest-volume workloads, reducing inference costs by 40-60%. These efficiency gains are crucial given the staggering cost of training frontier models, with estimates placing Google's Gemini Ultra at $192 million and GPT-4 at over $100 million for compute alone. Optimizing both training and inference is essential to the economic viability of deploying these models at scale. The rapid pace of combined software and hardware improvement means the performance of a state-of-the-art model can become accessible on high-end consumer hardware in as little as 8 months. This rapid diffusion of capability continuously fuels the open-source community and creates new opportunities for startups to build on top of what was recently considered the frontier.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.