Hoarding keeps GPU utilization 5%

- Cast AI said on April 21 its 2026 Kubernetes report found average GPU utilization at 5% across tens of thousands of enterprise clusters. - The sharpest detail is the mismatch itself: companies are provisioning about 20× more GPU capacity than they actively consume. - AWS’s January 2026 H200-related price hike made that waste sting harder — and turned bad scheduling into a budget problem.

GPU waste sounds like a boring ops metric. It isn’t. It’s a sign that companies rushed to lock down scarce AI hardware, then built systems that can’t give it back when demand softens. That is the real story behind Cast AI’s new Kubernetes report, released April 21, which pegs average GPU utilization at just 5% across the clusters it analyzed. (cast.ai) ### What is actually sitting idle? Mostly very expensive accelerator capacity inside Kubernetes clusters. Cast AI’s report says average CPU utilization was 8%, memory was 20%, and GPU utilization was 5% across the environments it studied. In plain English, teams are reserving far more infrastructure than their workloads use — and the gap is worst for GPUs, where idle time costs real money fast. (cast.ai) ### Why are GPUs worse than CPUs? Because engineers treat GPU scarcity like a survival problem. If a team finally gets access to H100s or H200s, nobody wants to release them and discover they can’t get them back next month. So requests get padded, reservations get locked in, and “just in case” becomes the default operat(cast.ai)— fear of missing out keeps capacity parked even when jobs are not running. (venturebeat.com) ### Why doesn’t autoscaling fix this? Because autoscalers react to the requests you declare, not the demand you imagined later. If a container asks for too much GPU, CPU, or memory, the scheduler and autoscaler treat that request as real. Then the whole cluster expands (venturebeat.com)become policy. (cast.ai) ### Why does Kubernetes make this easy to hide? Kubernetes is great at abstracting infrastructure, but abstraction can hide waste. A team sees a healthy service and assumes the resource settings are fine. Meanwhile, the cluster may be carrying huge headroom that nobody revisits. SDxCentral’s write-up highlights that CPU(cast.ai)ompounding rather than self-correcting. (sdxcentral.com) ### Why did this story land now? Because the economics got worse in January. AWS’s EC2 Capacity Blocks pricing page now says reservation prices are updated with supply and demand, and multiple January reports pinned the latest change at roughly a 15% increase for top-end ML capacity, includin(sdxcentral.com) a procurement failure. (aws.amazon.com) ### Is this just a cloud bill problem? No — it reaches strategy. If companies are using only 5% of reserved GPU capacity, then some AI infrastructure demand is not productive demand. It is defensive inventory. That matters for cloud budgets, but also for how investors and suppliers read the market. A shortage can be real and still be exa(aws.amazon.com)t part is an inference, but it fits the pricing and utilization data. (cast.ai) ### What would fix it? Not one cleanup project. Continuous rightsizing, better scheduling, and more willingness to share or time-slice GPUs across jobs. Cast AI also points to spot usage and node lifecycle management as weak spots. The broader point is simple — if resource settings are static while workloads change every week, utilization will stay terrible no matter how smart the hardware is. (cast.ai) ### Bottom line? The 5% number matters because it punctures a popular assumption — that every bought GPU is a busy GPU. Turns out a lot of AI capacity is being stockpiled, not consumed. And once scarce hardware gets hoarded inside rigid Kubernetes policies, the shortage starts reproducing itself.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.