Treat AI pricing as architecture

OpenAI’s pricing is now segmented across model families and use cases, so choosing which model to call is becoming an architectural decision rather than a default. The pricing roundup shows separate rates for GPT‑5.4, GPT‑5.2 and GPT‑5.1 plus cached‑input and Batch API discounts, while OpenAI also lists distinct Codex rate tiers for coding workflows (benchlm.ai, help.openai.com). A recent advisory also warns that inference cost can destroy agent project economics unless teams design routing, caching and batching into their backends (techaheadcorp.com).

Calling an artificial intelligence model is starting to look less like picking a default and more like drawing a systems diagram. OpenAI now posts separate prices for GPT-5.4, GPT-5.2, GPT-5.1, cached prompts, batch jobs, search calls, and coding-specific Codex tiers. (openai.com) On OpenAI’s current pricing page, GPT-5.4 costs $2.50 per 1 million input tokens and $15.00 per 1 million output tokens, while GPT-5.2 is listed at $1.75 and $14.00, and GPT-5.1 at $1.25 and $10.00. Cached input is cheaper on each tier: $0.25 for GPT-5.4, $0.175 for GPT-5.2, and $0.125 for GPT-5.1 per 1 million tokens. (openai.com, developers.openai.com, developers.openai.com) The same page advertises a 50 percent discount on inputs and outputs through the Batch Application Programming Interface, which runs jobs asynchronously over 24 hours. OpenAI also charges separately for tools around the model, including $10 per 1,000 web search calls and container sessions starting at $0.03 for 1 gigabyte over 20 minutes. (openai.com, developers.openai.com) For teams building agents, those line items change how backends are designed. A request router can send hard tasks to GPT-5.4, routine steps to GPT-5.1 or a mini model, and repeated context through the cache instead of paying full price every time. (openai.com, developers.openai.com, developers.openai.com) OpenAI’s own model pages now describe that split in roles. GPT-5.4 is labeled “our most capable model for professional work,” GPT-5.2 is a “previous frontier model for professional work,” and GPT-5.1 is “the best model for coding and agentic tasks.” (developers.openai.com, developers.openai.com, developers.openai.com) The coding stack has its own price sheet. In a Help Center update posted April 2, 2026, OpenAI said Codex pricing for Plus, Pro, ChatGPT Business, and new ChatGPT Enterprise plans moved from per-message billing to token-based billing aligned with the Application Programming Interface. (help.openai.com) That Codex card lists 62.50 credits per 1 million input tokens for GPT-5.4, 43.75 credits for GPT-5.2 and GPT-5.2-Codex, 31.25 credits for GPT-5.1-Codex-Max, and 6.25 credits for GPT-5.1-Codex-mini. Cached input is billed separately at lower rates, and OpenAI says Fast mode consumes twice as many credits. (help.openai.com) OpenAI also gives developers more ways to trade speed for cost. The pricing documentation shows standard, batch, flex, and priority processing options, and it adds a 10 percent uplift for regional processing on GPT-5.4-class models. (developers.openai.com) The result is that “use the best model” is no longer a complete implementation plan. On OpenAI’s current rate cards, the bill depends on which model answers, how much context is reused, whether work can wait for batch processing, and whether coding runs inside Codex or the core Application Programming Interface. (openai.com, help.openai.com, developers.openai.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.