AI spend blowouts common
- Audits and social posts report frequent AI overspend, with 60–90% of budgeted dollars sometimes wasted. (x.com) - Examples cited a $5M/year budget translating into roughly $3–4.5M lost to uncontrolled consumption. (x.com) - Practitioners recommend per-agent token budgets and per-run caps to prevent unexpected cost blowouts. (x.com)
Companies are finding that artificial intelligence projects can blow through budgets fast, with costs rising not from licenses alone but from every token an app sends and receives. (x.com) A token is a small chunk of text, and most large-language-model vendors bill by counting input and output tokens on every call. OpenAI’s current API pricing, for example, lists GPT-5.4 at $2.50 per 1 million input tokens and $15 per 1 million output tokens, while Anthropic lists Claude Sonnet 4.6 at $3 and $15, and Google lists Gemini paid-tier rates starting at $2 and $12. (openai.com) (anthropic.com) (ai.google.dev) That pricing looks manageable in a pilot, but costs compound when a chatbot turns into an “agent” that makes repeated calls, pulls long context windows, uses search tools, and spawns sub-agents. Deloitte wrote on January 19, 2026, that organizations now need to manage AI as an economic system with “unpredictable, token-based costs.” (deloitte.com) Practitioners posting recent audits say the waste can be severe when teams do not meter usage tightly. One widely shared example described a $5 million annual AI budget that translated into roughly $3 million to $4.5 million in losses from uncontrolled consumption, and another post recommended per-agent token budgets and per-run caps to stop surprise bills. (x.com 1) (x.com 2) The cost problem has become visible enough that cloud and software vendors are now pitching “AI FinOps,” the finance-and-operations discipline that tracks technology spend. Computer Weekly reported on April 16, 2026, that FinOps teams are stepping in to optimize token usage and tie AI spending to measurable returns. (computerweekly.com) Microsoft is also pushing more granular metering for agent systems. In an April 2026 post, the company showed how Microsoft Foundry, Azure API Management, and Application Insights can be combined to track per-agent, per-model token usage and costs in near real time. (techcommunity.microsoft.com) The mechanics behind the overruns are simple: longer prompts cost more, bigger outputs cost more, and repeated retries or loops multiply both. OpenAI’s model documentation also says very large GPT-5.4 prompts above 272,000 input tokens are priced at higher rates for the full session, which means oversized context can raise bills even faster. (developers.openai.com) Vendors now offer some relief valves, but they require active setup rather than hope. OpenAI advertises lower-priced cached input and batch processing, Anthropic says prompt caching can cut some costs by up to 90% and batch processing by 50%, and Google’s billing docs tell developers to watch usage dashboards and tier caps. (openai.com) (anthropic.com) (ai.google.dev) The thread running through all of this is that cheaper model prices have not eliminated expensive deployments. As companies move from small pilots to always-on agents, the bill is increasingly determined by how often systems call models, how much text they carry, and whether anyone put a hard stop on the meter. (deloitte.com) (x.com)