Token cost Jevons risk
A social post warned that a roughly 99% drop in token costs could unleash explosive demand and shift bottlenecks from token price to data, tooling and payment systems. (x.com)
A token is the small unit that language models bill on, roughly pieces of words, and the price of those units has fallen fast across mainstream application programming interfaces. OpenAI now lists GPT-5.4 nano at $0.20 per 1 million input tokens, while Google lists Gemini 3.1 Flash-Lite at $0.25 per 1 million text, image, or video input tokens on its paid tier. (openai.com) (ai.google.dev) Those prices sit far below the multi-dollar rates that frontier model providers still charge for larger systems. OpenAI lists GPT-5.4 at $2.50 per 1 million input tokens and Google lists Gemini 3.1 Pro at $2.00 per 1 million input tokens for prompts up to 200,000 tokens, with both companies also advertising lower-cost batch options. (openai.com) (ai.google.dev) Economists use “Jevons paradox” for cases where efficiency makes something cheaper and people use more of it, not less. The rebound effect is the same basic idea in plainer terms: lower unit cost can expand total consumption if new uses appear faster than savings. (wikipedia.org 1) (wikipedia.org 2) That is the risk behind ultra-cheap tokens. If generating text, code, images, or tool calls gets cheap enough to feel nearly free, developers can move from occasional prompts to always-on agents, larger context windows, repeated retries, and background automation that runs all day. (openai.com) (ai.google.dev) In that world, the bottleneck shifts away from the model line item and toward the systems around it. OpenAI’s pricing page separately charges $10 per 1,000 web-search calls and container usage for code execution, while Google charges for search grounding after a free allotment, showing that tool use can become a visible part of the bill even when tokens are cheap. (openai.com) (ai.google.dev) Data is another constraint because cheap inference does not create fresh, clean inputs on its own. Google’s pricing page breaks out context caching as a separate paid feature, which reflects a broader pattern in production systems: storing, retrieving, and organizing the right context can matter as much as generating the next answer. (ai.google.dev) (openai.com) Power and data-center capacity are already under pressure before any new wave of demand from near-free inference. The International Energy Agency said on April 10, 2025 that electricity demand from data centers worldwide is set to more than double by 2030 to about 945 terawatt-hours, with electricity demand from artificial-intelligence-optimized data centers projected to more than quadruple. (iea.org) Payments can also become a chokepoint when usage spreads globally into tiny, frequent transactions. Stripe says it supports businesses in a defined list of countries rather than everywhere, and its payments documentation says merchants can charge in more than 135 currencies, which helps global sales but does not remove country onboarding, settlement, tax, and minimum-charge constraints. (stripe.com) (docs.stripe.com) The counterargument is that lower token prices do not automatically mean runaway usage. Providers still impose rate limits, premium models remain much more expensive than small ones, and many applications are constrained by accuracy, regulation, customer acquisition, or the cost of human review rather than by raw token spend alone. (openai.com) (ai.google.dev) But the basic pattern is visible in the pricing tables already. As token costs fall toward fractions of a dollar per million, the scarce pieces of the stack look less like words on a bill and more like power, data pipelines, tool orchestration, and the rails that collect money from users around the world. (openai.com) (iea.org)