Cast AI finds 95% GPU idle
- Cast AI said on April 21 its 2026 Kubernetes report found average GPU utilization at just 5% across tens of thousands of production clusters. - The sharpest detail is the imbalance itself: 95% of provisioned GPU capacity sat idle, while CPU utilization averaged 8% and memory 20%. - That lands as Meta lifts 2026 capex to $125 billion-$145 billion and component prices push AI infrastructure costs higher.
GPU scarcity is supposed to be the story of the AI buildout. But the awkward twist is that a lot of paid-for GPU time is not doing useful work. Cast AI’s new Kubernetes optimization report puts a number on that mismatch — average GPU utilization at 5% across the clusters it analyzed, meaning most of the expensive accelerator capacity was sitting idle. That matters now because the industry is still pouring money into more chips and more data centers, even as the stuff already rented often goes underused. (cast.ai) ### What did Cast AI actually measure? Cast AI looked at real-world Kubernetes workloads and infrastructure usage across tens of thousands of clusters, then focused on environments that were not already optimized by its software. The headline number was GPU utilization at 5%. CPU utilization came in at 8%, and memory at 20%. So this is(cast.ai)uch more expensive once accelerators enter the picture. (cast.ai) ### Why is 5% utilization such a big deal? A mostly idle CPU is wasteful. A mostly idle GPU is brutal. GPU instances cost dollars per hour, not pennies, and teams often provision them as insurance against spikes, training runs, or future AI projects that have not fully materialized yet. In practice, that means companies can end up payi(cast.ai)ks “reserved.” The bill looks very real. (cast.ai) ### Why does Kubernetes make this worse? Kubernetes is good at packing workloads efficiently in theory. But the catch is that teams have to tell it what resources an application might need, and those requests are often inflated. Once those requests are set too high, autoscalers and schedulers behave as if the demand is real. Extra node(cast.ai) not fix much if traffic, models, and batch jobs keep changing week to week. (cast.ai) ### Is this just companies panic-buying AI capacity? Basically, yes — at least in part. Enterprises do not want to be the team that cannot get GPUs when a model launch or internal AI push suddenly needs them. So they overprovision early and tolerate low utilization later. That is rational from one angle. But at scale it turns the cloud(cast.ai)annot keep up with that churn, so the waste compounds instead of correcting itself. (cast.ai) ### Why does this matter right now? Because the spending curve is still heading up. Meta said this week that it now expects 2026 capital expenditures of $125 billion to $145 billion, higher than its prior range, with higher component pricing and data center costs helping drive the increase. In other words, one of the biggest buyers in (cast.ai) is often poorly utilized. That is not a contradiction exactly — but it is a warning that buying more hardware alone will not solve the bottleneck. (investor.atmeta.com) ### Does this mean there is no real GPU shortage? Not quite. A chip can be scarce in the market and still be used badly after it arrives. Those are different problems. Shortages hit when companies all want guaranteed access at the same time. Low utilization shows up because many(investor.atmeta.com)dual fleets still look sleepy. That is the weird part. (cast.ai) ### What is the practical takeaway? The real constraint may be less “not enough GPUs exist” and more “not enough organizations know how to keep GPUs busy.” Better scheduling, shorter reservations, dynamic placement, and continuous rightsizing will not eliminate demand for new chips. But they do change the economics fast. If Cast AI’s n(cast.ai) just about owning accelerators — it is about operating them like scarce assets instead of expensive décor. (cast.ai)