AI pricing feels rigged
Conversation this week flagged a growing ‘model‑selection tax’ where very cheap models ($0.001) can handle roughly 80% of tasks but vendors still push expensive frontier models (≈$0.15), creating a real cost gap for teams. (x.com).
A lot of teams now have the same complaint: the cheapest artificial intelligence model can answer the boring 80% of work, but the default button still points them at the premium one. OpenAI’s current flagship GPT-5.4 is priced at $2.50 per million input tokens and $15 per million output tokens, while GPT-5.4 nano is $0.20 and $1.25. (openai.com) That gap is not small. On output tokens alone, GPT-5.4 costs 12 times as much as GPT-5.4 nano, and OpenAI describes the flagship as a model that “spend[s] more time thinking” for complex multi-step problems rather than simple high-volume jobs. (openai.com) Anthropic shows the same ladder. Claude Haiku 4.5 starts at $1 per million input tokens and $5 per million output tokens, while Claude Sonnet 4.6 starts at $3 and $15, and Anthropic says Sonnet 4.6 is now the default model for Free and Pro users in Claude.ai. (anthropic.com, anthropic.com) Google’s menu is similar. Gemini 3.1 Flash-Lite Preview is listed at $0.25 per million text, image, and video input tokens and $1.50 per million output tokens, while Gemini 3.1 Pro Preview is $2 input and $12 output for prompts up to 200,000 tokens. (cloud.google.com, ai.google.dev) That is where the “rigged” feeling comes from. If a customer support bot, document classifier, or invoice parser works fine on the cheap tier, every accidental call to the premium tier is like taking a taxi for a one-block trip. (anthropic.com, cloud.google.com) The pricing page itself nudges behavior. OpenAI puts “Our frontier models” first and describes them as ideal for professional work, while Anthropic says Sonnet 4.6 is the default in its main product, so the expensive model is often the one people touch before they test whether a cheaper one is enough. (openai.com, anthropic.com) Vendors do have a real argument here. Anthropic says Sonnet 4.6 can handle coding, agent planning, and long-context work that used to require its Opus line, and OpenAI says GPT-5.4 is built for complex multi-step problems, so the premium tier is not fake capability. (anthropic.com, openai.com) But the bill is driven by task routing, not by marketing copy. OpenAI’s own docs call GPT-4.1 nano the “fastest, most cost-efficient” version of GPT-4.1 at $0.10 input and $0.40 output per million tokens, and GPT-4.1 mini is $0.40 and $1.60, which means there are already multiple lower-cost steps before a team ever reaches a frontier model. (developers.openai.com, developers.openai.com) The hidden tax shows up when companies skip that routing step. A product manager picks the safest-looking top model, the engineering team ships it everywhere, and the finance team later discovers that millions of routine prompts were billed at premium rates. (openai.com, anthropic.com) Google is already selling one answer to that problem. Vertex AI now offers a Model Optimizer meta-endpoint that lets enterprise customers send Gemini requests without specifying Flash, Pro, or a version, which is basically an automated traffic cop for model choice. (cloud.google.com) The fight in 2026 is no longer just who has the smartest model. It is who can make the cheap model handle the largest share of real work before the expensive model gets called in. (openai.com, anthropic.com, cloud.google.com)