API pricing snapshot
A pricing summary aggregated OpenAI API rates for GPT‑5.4, GPT‑5.2 and GPT‑5.1, including cached‑input rates and batch discounts that shape token economics for product builders (benchlm.ai). That kind of cost detail affects design decisions like caching strategies, batching and offline preprocessing for startups building on these models (benchlm.ai).
OpenAI’s API price sheet now puts hard numbers on how much startups pay to run its newer GPT‑5 family — and how much they can shave off with caching and batch jobs. (openai.com) On OpenAI’s pricing page, GPT‑5 input tokens are listed at $1.25 per 1 million, cached input at $0.125 per 1 million, and output at $10 per 1 million. GPT‑5 mini is listed at $0.25 input, $0.025 cached input, and $2 output per 1 million tokens. (openai.com) GPT‑5 nano is cheaper still at $0.05 per 1 million input tokens, $0.005 for cached input, and $0.40 for output, according to OpenAI. The same page says the Batch Application Programming Interface cuts both input and output costs by 50 percent when developers can wait up to 24 hours for results. (openai.com) Tokens are the small chunks of text models read and write, so price changes at that level flow straight into product budgets. Cached input means a developer reuses text the model has already processed, like a long system prompt or repeated document context, and pays a lower rate the next time. (openai.com) That pricing structure pushes builders toward a specific playbook: keep repeated prompts stable, reuse context where possible, and move non-urgent work into overnight or delayed batch runs. BenchLM’s April 2026 pricing roundup framed those discounts as central to “token economics” for teams choosing model sizes and product features. (benchlm.ai) The same roundup organized rates across the GPT‑5.4, GPT‑5.2, and GPT‑5.1 lines, alongside cached-input and batch discounts, to show how small differences in per-token pricing can compound at scale. For a product sending millions of tokens a day, the gap between standard and cached input can be the difference between an always-on feature and an optional one. (benchlm.ai) OpenAI’s page also separates flagship models from tools aimed at lower-cost use cases, including GPT‑5 mini and GPT‑5 nano, giving developers a menu rather than a single default. That matters for products that split jobs by difficulty, using a cheaper model for classification or cleanup and a stronger one for final answers. (openai.com) The pricing page adds another lever: web search, file search, and computer-use tools carry their own charges or token rules on top of base model costs. A team that adds retrieval, browsing, or automation can lower model spend with caching and batching while still seeing total bills rise once tool usage is included. (openai.com) The result is that model choice is no longer just about quality scores or benchmark rankings. On OpenAI’s own numbers, the cheapest path often depends on how much of a workload is repeated, how fast the answer is needed, and whether the job can be broken into deferred batch tasks. (openai.com)