95% of GPUs idle
- A cloud report found many organizations are hoarding GPU capacity they barely use, creating large idle pools. - The estimate: roughly 95% of GPU capacity across thousands of companies was idle in the report summary. - That gap shifts attention from buying chips to improving allocation, scheduling and autoscaling with Kubernetes and cloud tooling (businessinsider.com) (x.com/i/status/2046574003758702868).
A graphics processing unit, or GPU, is the chip that trains and runs many artificial intelligence models — and new data suggests most of that expensive capacity is sitting unused. Cast AI said average GPU utilization across the Kubernetes clusters it analyzed was 5%, leaving roughly 95% of provisioned capacity idle. (cast.ai) Cast AI released the figure on April 21, 2026 in its State of Kubernetes Optimization Report, which it said drew on data from 23,000 clusters across thousands of companies. Business Insider reported the company found most organizations were providing about 20 times more GPU capacity than they were actively using at any given moment. (cast.ai) (businessinsider.com) Kubernetes is the software many companies use to place applications on servers, add machines when demand rises, and shut them down when demand falls. The Kubernetes project describes it as an open-source system for automating deployment, scaling, and management of containerized applications. (kubernetes.io) That matters because a GPU is not like an ordinary central processing unit, or CPU, on a cloud bill. Business Insider said Cast AI’s report estimated GPUs can cost up to 50 times as much as comparable CPU-based machines, while Cast AI reported average CPU utilization of 8% and memory utilization of 20% in the same dataset. (businessinsider.com) (cast.ai) The report lands as companies are still scrambling to lock in artificial intelligence computing capacity, especially for Nvidia systems. Business Insider said Cast AI chief executive Laurent Gil blamed “fear of missing out” for long-term GPU commitments that outstrip real demand. (businessinsider.com) The immediate fix is less about buying more chips and more about treating GPUs like shared infrastructure instead of reserved parking spots. Kubernetes documentation says autoscaling lets workloads and nodes expand or shrink with demand, and its Dynamic Resource Allocation feature lets clusters request and share attached devices such as hardware accelerators. (kubernetes.io 1) (kubernetes.io 2) GPU sharing is one way to do that. Nvidia’s Multi-Instance GPU system, known as MIG, splits supported GPUs into multiple isolated instances with dedicated compute and memory resources so several jobs can run on one physical chip. (docs.nvidia.com) Cast AI’s own documentation pitches the same approach in cost terms: track provisioned GPUs, requested virtual GPUs, and actual usage, then look for memory waste and cost waste at the workload level. Its platform says teams can compare provisioned, requested, and actual usage patterns for CPU, GPU, and memory across clusters. (docs.cast.ai) (cast.ai) Cast AI also has a commercial interest in this diagnosis, because it sells automation software for Kubernetes optimization. But the company’s number still captures a shift in the artificial intelligence infrastructure debate: for many firms, the bottleneck is no longer just getting GPUs — it is keeping the ones they already bought busy. (cast.ai 1) (cast.ai 2)