Enterprise GPUs at 5% utilization

- Industry signals show average enterprise GPU utilization sits near 5% even as public reports describe shortages, pointing to heavy overprovisioning. (x.com) - That 5% figure is the headline gap: firms bought capacity under FOMO but are running far less of it in steady workloads than expected. (x.com) - Low utilization could accelerate cost‑optimization moves, secondary capacity markets, and tighter cloud‑infrastructure competition for AI compute. (x.com)

GPUs are the hottest asset in enterprise tech, but a lot of them are being used like insurance policies, not engines. That’s the real story behind the new 5% utilization figure making the rounds. Cast AI’s 2026 Kubernetes optimization report says average GPU utilization across 23,000 enterprise clusters is just 5% — with CPU at 8% and memory at 20% — even while companies keep talking about shortages and racing to lock in more capacity. Why does that sound so backwards? Because “shortage” and “low utilization” can both be true at once. The shortage is about access to the right GPUs, in the right place, at the right time, with enough certainty to support a launch or internal AI mandate. Utilization is about what happens after the hardware is reserved. Enterprises are buying for peak demand, future projects, and executive fear of being caught short — then running much smaller steady-state workloads most of the time. So what is this 5% number actually measuring? It is not “all GPUs everywhere.” It comes from non-optimized Kubernetes clusters in Cast AI’s dataset, which means you should read it as a strong signal, not a universal law of physics. But the sample is big enough to matter. Twenty-three thousand clusters is not a rounding error, and the same report says overprovisioning got worse year over year, with CPU overprovisioning rising to 69%. Why are companies so bad at this? Basically, GPUs are expensive to share and politically hard to give up. A team that finally secured H100s or H200s does not want to release them and risk waiting months to get them back. Internal platform teams also tend to allocate whole GPUs or whole nodes because that is simpler, safer, and easier for chargebacks. The result is stranded capacity — hardware sitting idle because the org chart, the scheduler, and the budget process all say “hold it.” NVIDIA’s own Run:ai pitch leans hard on this exact problem: dynamic scheduling, GPU fractions, and automatic prioritization to cut idle time. Why does this matter now? Because GPU prices are not behaving like normal cloud prices. AWS raised EC2 Capacity Block pricing for some H200-backed instances by about 15% in January 2026, and its pricing page says these reservation prices are updated based on supply and demand. So enterprises are overbuying one of the few compute resources that is getting more expensive, not cheaper. That is a nasty combo. Does that mean the “GPU shortage” is fake? Not really. The shortage is increasingly a market design problem, not just a silicon problem. There can be idle GPUs inside one company, long waits at a hyperscaler, and high prices for premium reserved capacity all at the same time. The missing piece is liquidity — the ability to move compute to where demand actually is, without legal, security, and operational friction. That is why this kind of data points toward more internal marketplaces, tighter FinOps controls, and more pressure on clouds and orchestration vendors to make GPU sharing less painful. The bottom line is simple. Enterprises did not just buy GPUs for current workloads — they bought optionality. But optionality is expensive when the meter is running. If the 5% figure is even directionally right, the next phase of the AI infrastructure boom will be less about buying more chips and more about forcing the chips already bought to do actual work.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.