OpenAI debuts GPT‑Realtime‑2 voice model
- OpenAI launched three new API voice models on May 7: GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper, aimed at live speech apps. - The headline model, GPT‑Realtime‑2, adds GPT‑5‑class reasoning, configurable reasoning effort, stronger tool use, and pricing from $4 per 1M text input tokens. - This pushes OpenAI past basic voice chat toward agents that can listen, think, call tools, and respond fast enough for meetings and support.
Voice AI has been good at sounding natural for a while. The harder part has been thinking clearly while someone is still talking — and then actually doing something useful, like calling a calendar, CRM, or search tool without the conversation falling apart. That is the gap OpenAI is trying to close with GPT‑Realtime‑2, a new speech-to-speech API model released on May 7 alongside a live translation model and a streaming transcription model. The pitch is simple: less “voice wrapper around a text bot,” more real-time assistant that can keep up. ### What launched, exactly? OpenAI released three models together: GPT‑Realtime‑2 for live voice conversations, GPT‑Realtime‑Translate for speech translation, and GPT‑Realtime‑Whisper for low-latency transcription. All three sit inside the API rather than ChatGPT’s consumer app, so this is mainly for developers building call centers, meeting tools, tutors, kiosks, and voice agents inside other products. (openai.com) ### What is GPT‑Realtime‑2 supposed to do? Basically, it is the “talk and think at the same time” model in the bunch. OpenAI describes it as its first voice model with GPT‑5‑class reasoning, built for harder requests, longer context, and more natural turn-taking. The docs also say it can think before it speaks, follow instructions more reliably, and call tools with greater precision than earlier realtime models. (openai.com) ### Why is tool use the big deal? Because voice assistants usually break at the exact moment they need to leave the chat and do work. It is easy to answer a spoken question. It is much harder to hear “move my 3 p.m., text Alex, and summarize the last note,” then hit the right tools in the right order while keeping the conversation fluid. OpenAI’s realtime stack now explicitly supports tool calls and MCP-style server integrations, which is what makes this feel more like an agent layer than just speech synthesis. (openai.com) ### How does it stay fast enough? The tradeoff is configurable reasoning. GPT‑Realtime‑2 lets developers set reasoning effort, with the docs warning that higher effort can raise latency and token usage. That matters because voice is unforgiving — a delay that feels acceptable in text feels awkward in speech. So the model is trying to balance two things that usually fight each other: better reasoning and low-latency replies. (developers.openai.com) ### Is this replacing the older realtime models? Not exactly. OpenAI still lists other realtime options, including gpt‑realtime, gpt‑realtime‑1.5, and gpt‑realtime‑mini. The new model looks like the premium reasoning choice, while the older ones still cover faster or cheaper use cases. In other words, OpenAI is turning voice into a proper model family, not a single demo feature. ### What about cost? (developers.openai.com) The published pricing for GPT‑Realtime‑2 starts at $4 per 1 million text input tokens, $24 for text output, $32 for audio input, and $64 for audio output, with discounted cached input pricing. That tells you who this is for. Not hobby chatbot toys — production systems where better calls, fewer handoffs, or faster workflows can justify the bill. ### Why does this matter beyond OpenAI? (developers.openai.com) Because the voice race is shifting from “can it talk?” to “can it reason, act, and stay smooth while doing both?” The old pipeline was clunky — transcribe speech, run a text model, then generate speech back out. OpenAI’s realtime push is about collapsing that into one live loop, which is how you get assistants that feel less like IVR menus and more like a competent operator. (developers.openai.com) ### Bottom line? GPT‑Realtime‑2 is not just a better voice. It is OpenAI betting that the winning assistant will be the one that can listen, reason, use tools, and answer before the pause gets weird. (openai.com 1) (openai.com 2)