OpenAI shrinks GPT‑5.4
OpenAI released ‘mini’ and ‘nano’ variants of GPT‑5.4 aimed at lower latency and cost for real‑time apps — the mini is rolling out to ChatGPT Free and Go users to democratize advanced reasoning and multimodal understanding ( ). These smaller footprints are explicitly positioned for developers and businesses that need speed over full flagship scale. (digitaltrends.com)
OpenAI published the GPT‑5.4 mini and nano announcement on March 17, 2026. (openai.com) OpenAI’s own benchmark table shows GPT‑5.4 scoring 57.7% on SWE‑Bench Pro versus 54.4% for GPT‑5.4 mini and 52.4% for GPT‑5.4 nano, while GPT‑5 mini scored 45.7% on the same test. (openai.com) The company reports GPT‑5.4 mini runs more than twice as fast as the earlier GPT‑5 mini while “approaching” the flagship on several evaluations, a claim that independent outlets summarized as a major speed/performance tradeoff for real‑time workflows. (openai.com) API pricing lists GPT‑5.4 mini at $0.75 per 1M input tokens and $4.50 per 1M output tokens, with GPT‑5.4 nano priced at $0.20 per 1M input tokens and $1.25 per 1M output tokens. (platform.openai.com) OpenAI says GPT‑5.4 mini uses roughly 30% of a GPT‑5.4 session’s quota when run as a Codex subagent, which OpenAI positions as a way to cut the cost of routine coding tasks to about one‑third. (openai.com) GPT‑5.4 mini supports multimodal inputs, tool calls and function calling and is offered across the API and Codex platform, while GPT‑5.4 nano is being pushed as an API‑only, ultra‑low‑latency model for classification, extraction, ranking and lightweight subagent work. (openai.com) OpenAI documents a 400,000‑token context window for the mini variant and highlights use cases such as fast coding loops, parallel subagent execution, and real‑time multimodal applications that prioritize latency over the largest possible reasoning context. (eweek.com)