OpenAI launches GPT‑Realtime‑2 API
- OpenAI launched three new realtime audio models on May 7: GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper, all built for live voice applications. - The headline detail is translation scale — speech from 70+ input languages into 13 output languages — plus configurable reasoning inside speech-to-speech sessions. - It matters because OpenAI also retired the old Realtime beta today, pushing voice developers onto a more capable and more opinionated stack.
Voice AI is getting split into clearer pieces. One model talks, one translates, one transcribes — and OpenAI now wants developers to build around that stack instead of the older catch-all realtime setup. That matters because voice agents have had the same problem for a while: they could sound fast or sound smart, but doing both at once was harder than the demos made it look. On May 7, OpenAI tried to close that gap with GPT‑Realtime‑2, plus two companion models for live translation and streaming transcription. ### What actually launched? OpenAI shipped three models into the API: GPT‑Realtime‑2 for speech-to-speech agents, GPT‑Realtime‑Translate for live speech translation, and GPT‑Realtime‑Whisper for low-latency speech-to-text. They sit on the general-availability Realtime API, which already supports WebRTC, WebSocket, and SIP connections for live audio apps. to be? Basically, it is the “think harder while still talking” version. OpenAI describes it as its first voice model with GPT‑5‑class reasoning, aimed at harder requests, better context handling, and more natural back-and-forth. The notable product choice is configurable reasoning effort — developers can dial reasoning up or down depending on whether they care more about latency or depth. ### Why does configurable reasoning matter? Because voice is brutal about delay. In text chat, a couple extra seconds can feel acceptable. In a spoken conversation, that same pause feels like the system got lost. OpenAI is basically admitting the tradeoff out loud: more reasoning can improve answers, but it can also raise latency and token use. So GPT‑Realtime‑2 is not just a smarter voice model — it is a knob for choosing where on the speed-versus-brains curve your app should sit. ### What’s new in translation? The translation model is more specific than the early chatter suggested. GPT‑Realtime‑Translate handles speech from 70+ input languages into 13 output languages while keeping pace with the speaker. That makes it less like a general voice bot that happens to know languages and more like dedicated live interpretation infrastructure for meetings, support calls, travel tools, and multilingual assistants. ### And what about Whisper here? Turns out GPT‑Realtime‑Whisper is not the same thing as the voice model hearing audio natively. OpenAI’s docs make that distinction pretty clearly: realtime models consume audio directly, while transcription runs as a separate ASR layer and should be treated as guidance rather than a perfect record of what the model “heard.” The new Whisper model is for developers who need fast transcript deltas, captions, logs, or downstream text workflows. ### Why launch all three together? Because production voice apps usually need all three jobs at once. A customer-support agent may need live conversation, a transcript for records, and translation for cross-language calls. A single flashy speech model does not solve that whole stack. OpenAI is packaging the pieces more explicitly now — which makes the platform easier to reason about, but also nudges developers deeper into OpenAI’s own realtime architecture. ### What changed for developers today besides the models? A quiet but important platform shift happened at the same time: the old Realtime beta was deprecated and removed on May 7, 2026. So this is not just a launch. It is also a migration moment. If you were still sitting on the older interface, today was the day OpenAI made the new stack the default path forward. voice from a demo feature into a product surface with specialized parts. The big idea is simple — smarter live agents, dedicated live translation, and cleaner transcription in one API family. The catch is that voice still runs on latency budgets, so the real test is not whether GPT‑Realtime‑2 can reason better. It is whether it can do that without making conversations feel slow.