Tom's Hardware: AI models getting pricier
- Nvidia AI chief scientist Bryan Catanzaro said inference compute now costs his team more than the employees using it, sharpening enterprise worries about runaway AI spend. - OpenAI’s flagship API prices run from $5 input and $30 output per million tokens for GPT-5.5, while Anthropic’s Opus 4.7 costs $5 and $25. - The pressure lands as measured gains stay uneven — 14% to 15% in one support study, but strongest for novices.
AI spending has a weird problem right now. The models keep getting better, but the bill for using them at scale is still large enough that some companies are asking a blunt question — is the machine actually cheaper than the person? That tension snapped into focus this week after Tom’s Hardware highlighted comments from Nvidia’s Bryan Catanzaro saying inference compute now costs more than the employees using it, at least for his team. The point wasn’t that AI is failing. It was that sloppy AI is expensive. (entrepreneur.com) ### Why is inference the budget killer? Training gets the headlines, but inference is the meter that keeps running. Every user query, every agent step, every retrieval pass, and every long answer burns tokens and compute. If a product has real traffic, that cost shows up every day, not once. That is why enterprise teams (entrepreneur.com)e. (tomshardware.com) ### Are frontier models actually that expensive? Basically, yes — especially on output-heavy workloads. OpenAI’s current pricing page lists GPT-5.5 at $5 per million input tokens and $30 per million output tokens, with long-context rates higher. Anthropic lists Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens, with extra pricing la(tomshardware.com) in a demo. They stop looking manageable when an agent fans out across tools, generates long chains of reasoning, and serves thousands of users. (developers.openai.com) ### So why not just use cheaper models? That is exactly where the market is heading. The practical move is staged model selection — use a small model for classification, routing, extraction, and easy drafts, then escalate only the hard cases to an expensive model. Prompt caching helps too, because repeated context can be billed at sharply lower rates on both OpenAI and (developers.openai.com)to know which tasks truly need a frontier model and which ones just need a competent one. (developers.openai.com) ### But aren’t productivity gains supposed to pay for this? Sometimes they do. But the results are not uniform enough to wave away the cost question. In the well-known customer-support study from NBER and later the *Quarterly Journal of Economics*, access to a generative AI assistant raised productivity by about 14% to 15% on average, with much bigger gains for novice a(developers.openai.com)is useful, but it is not a universal “replace labor” number. It depends on the task, the worker, and the workflow around the model. (nber.org) ### What about full automation? Turns out the economics are even less dramatic there. A 2024 MIT-led paper on computer-vision tasks estimated that only about 23% of wages tied to those tasks were economically attractive to automate at current costs. Different domain, yes, but the lesson travels: technical feasibility is not the same thing as cost-effective deployment. A model can work and still not make financial sense. (ide.mit.edu) ### What does “token waste” actually mean? It usually means companies are paying for words and steps that do not improve the answer. Huge prompts full of irrelevant context. Repeated retrieval of the same documents. Expensive models answering easy questions. Agents taking six tool calls to do a two-call job. Think of it like leaving every light on in an office that now runs 24/7 — each bulb is small, but the monthly bill gets ugly fast. (tomshardware.com) ### Who benefits from this squeeze? The boring-sounding tooling layer. Prompt observability, token budgeting, routing systems, caching infrastructure, and evaluation harnesses all become easier to justify when finance teams want proof that AI spend maps to business value. That creates demand for vendors and consultants who can trim usage without gutting capabilit(tomshardware.com) ### What is the bottom line? The AI story is shifting from raw model power to operating efficiency. Better models still matter, but enterprise buyers increasingly care about cost per useful outcome. In that world, talent over tokens is not an anti-AI argument — it is the playbook for making AI survive contact with a real budget. (tomshardware.com)