- Most AI GPUs run at shockingly low utilization across production systems
- Businesses are paying for twenty times more GPU capacity than needed
- The oversupply increases sharply instead of getting better year after year
Companies across the tech industry are racing to buy massive amounts of AI infrastructure, but most of it does almost no useful work at all.
A report by Cast AI, based on tens of thousands of Kubernetes clusters across AWS, Azure and GCP, found that the average GPU utilization is only 5%.
Many teams deploy sophisticated AI tools to manage their applications, yet those same tools are not used to optimize the underlying infrastructure.
The article continues below
The numbers are getting worse, not better
Organizations are paying for about 20 times more GPU capacity than their workloads are actually using at any given time.
The numbers come from direct measurements of production clusters and millions of computing resources before any optimization was applied.
“This is the third year we’ve published this report. The numbers are worse,” said Laurent Gil, co-founder and president of Cast AI. “CPU utilization dropped to 8%, down from 10%. Memory dropped from 23% to 20%.”
The report also measured something called overprovisioning, which is the gap between what workloads actually need and what teams allocate to them.
CPU overprovisioning increased from 40% to 69% year over year, while memory overprovisioning is now at 79%.
This means organizations are reserving nearly twice as much CPU resources and four times as much memory as their workloads actually use.
In short, organizations are paying for infrastructure that their workloads don’t even require, and the trend is accelerating rather than improving.
The situation becomes even more expensive when comparing CPU and GPU costs directly. A CPU core sitting idle costs only cents an hour, but a GPU sitting idle costs dollars an hour.
For the first time since EC2 launched in 2006, GPU prices are rising instead of falling.
In January 2026, AWS raised H200 Capacity Block prices by 15%, citing supply and demand, breaking a two-decade precedent.
“At 5% utilization, the math doesn’t work,” the report states. The hoarding instinct makes sense because delivery times are long, but that same hoarding feeds the scarcity loop that drives prices even higher.
Not every cluster performs this badly, and one organization hit 49% utilization on H200s and 30% on H100s, well above the 5% average.
The difference comes down to automation rather than luck or better hardware. The tools to address this already exist, including automated entitlement sizing, GPU sharing or time slicing, and Spot management.
However, most teams never get there because overprovisioning feels safer than running out of capacity, but that security comes at a high price.
The teams that closed the gap stopped treating resource efficiency as a one-time manual task and started treating it as an automated, continuous process.
But Cast AI data reveals that most companies seem willing to keep paying hefty fees rather than change their habits.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds.



