OpenAI launches GPT‑Realtime‑2 voice

- OpenAI launched three new API voice models on May 7: GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper for live reasoning, translation, and transcription. - The big upgrade is context and control: GPT‑Realtime‑2 jumps to a 128,000-token window and adds configurable reasoning effort for harder voice tasks. - This pushes OpenAI’s voice stack from “talk back quickly” toward “listen, think, and act” for production apps.

OpenAI just pushed its voice stack forward in a pretty concrete way. On May 7, it released three new API models — GPT‑Realtime‑2 for live voice agents, GPT‑Realtime‑Translate for live interpretation, and GPT‑Realtime‑Whisper for streaming speech-to-text. The point is not just nicer voices. It’s that voice apps are starting to move from simple turn-taking into actual work — keeping context, handling harder requests, and staying useful while people keep talking. (openai.com) ### What actually launched? OpenAI split the release into three jobs instead of one catch-all model. GPT‑Realtime‑2 is the general voice agent — the one that listens, reasons, speaks back, and can use tools during a live session. GPT‑Realtime‑Translate is a dedicated interpreter for continuous speech translation. GPT‑Realtime‑Whisper is the transcription (openai.com)“transcriber” are different products, and OpenAI is now treating them that way in the API. (openai.com) ### Why is GPT‑Realtime‑2 the real news? Because this is the first OpenAI voice model it describes as having GPT‑5‑class reasoning. In plain English, that means the model is supposed to do more than react quickly. It should handle messier requests, follow instructions more reliably, and keep a conversation moving when the user changes course midstream. Op(openai.com)thinking when the task is harder. (openai.com) ### What changed under the hood? The biggest spec jump is context. GPT‑Realtime‑2 has a 128,000-token context window and up to 32,000 max output tokens. The older general-availability `gpt-realtime` model had a 32,000-token context window and 4,096 max output tokens. So this is not a small tune-up — it is a much larger memory budget for ongoing calls, han(openai.com)ned. (developers.openai.com) ### Why split translation into its own model? Because live translation is a different interaction pattern. In a normal voice-agent session, the model acts like an assistant — it manages a conversation, can call tools, and waits for response creation steps. In a translation session, the model acts like an interpreter and starts translating directly from the incoming(developers.openai.com)uce 13 output languages while keeping pace with the speaker. That makes it more like a live comms layer than a chatbot. (openai.com) ### Where does Whisper fit now? Whisper becomes the dedicated realtime transcription path. That sounds boring, but it is useful product plumbing. A lot of apps do not want a speaking assistant at all — they want captions, notes, searchable transcripts, analytics, or compliance logs. OpenAI’s docs position GPT‑Realtime‑Whisper for streaming transcript delt(openai.com)nd transcription sessions based on the job. Basically, the product line is getting cleaner. (openai.com) ### What does this mean for developers? It means OpenAI is making voice feel less like a demo and more like infrastructure. GPT‑Realtime‑2 runs over the same realtime stack — including WebRTC, WebSocket, and SIP-style patterns — but now with a bigger context window and reasoning controls. Pricing also signals the positioning: text input stays at $4 per 1M(openai.com)16, which fits a model marketed as more capable rather than merely faster. (developers.openai.com) ### So what’s the bottom line? This launch is OpenAI saying voice apps should do more than sound natural. They should remember, interpret, transcribe, and take action in real time. The gap has been that low-latency voice often felt shallow. GPT‑Realtime‑2 is OpenAI’s attempt to make live speech systems deeper without giving up the pace that makes voice useful in the first place. (openai.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.