OpenAI ships GPT‑5.4 Mini/Nano
OpenAI released GPT‑5.4 Mini and Nano as lighter, lower‑latency fallbacks to GPT‑5.4 ‘Thinking’ and made them available to ChatGPT Plus/Pro and enterprise customers for high‑availability deployments. The release comes with a prompting playbook aimed at designers to reduce generic outputs and improve frontend UX for production apps. (help.openai.com)
GPT‑5.4 mini is listed at $0.75 per 1M input tokens and $4.50 per 1M output tokens with cached input billed at $0.075 per 1M, and it exposes a 400,000‑token context window and 128,000 max output tokens. (developers.openai.com). GPT‑5.4 nano is priced at $0.20 per 1M input tokens and $1.25 per 1M output tokens with cached input at $0.02, and shares the 400,000‑token context window while targeting ultra‑high throughput short‑turn tasks. (developers.openai.com). OpenAI’s release notes state GPT‑5.4 mini is rolling into ChatGPT via the “Thinking” menu for Free and Go users and will act as an automatic rate‑limit fallback for paid tiers, with enterprises able to opt into auto‑routing; GPT‑5 Thinking mini will be retired from the model picker in 30 days. (help.openai.com). Capability differences are explicit in the model docs: GPT‑5.4 mini supports native computer‑use tools, web and file search, function calling and skills, whereas GPT‑5.4 nano is tuned for classification, extraction, ranking and explicitly does not support the computer‑use tool or tool search. (developers.openai.com). Developers can control internal compute vs. latency with the reasoning.effort parameter and a verbosity knob in the Responses API, and OpenAI’s reasoning guide shows examples for passing reasoning.effort to trade thinking depth for speed. (developers.openai.com; developers.openai.com/guides/reasoning). OpenAI’s prompting playbook for frontend designers recommends concrete inputs — define a design system (colors, typography), supply mood boards or visual references, and structure page generation as a narrative — plus migration patterns like completeness checks, verification loops, and tool persistence to reduce generic outputs. (the‑decoder.com; developers.openai.com/guides/prompt‑guidance).