OpenAI adds WebSockets to Responses
- OpenAI added WebSocket mode to the Responses API on April 30, letting developers keep one connection open for long-running agent loops. - The key claim is speed: for workflows with 20 or more tool calls, OpenAI says end-to-end execution can run roughly 40% faster. - That matters because agent apps increasingly bottleneck on transport overhead, not just model time, so infrastructure choices now shape product latency.
WebSockets sound like plumbing — and basically they are. But this is the kind of plumbing change that can make an agent feel dramatically faster without changing the model at all. OpenAI just added a WebSocket mode to its Responses API, which is the main interface developers use for tool-calling, stateful, multi-step AI workflows. The point is simple: stop reopening the pipe every turn, send only the new bits, and cut the dead time between model and tools. (developers.openai.com) ### What actually changed? Before this, most Responses API traffic happened over ordinary HTTP requests. That works fine for one-shot prompts and even for many streaming use cases. But agent loops are different — the model calls a tool, waits, gets results back, thinks again, calls another tool, and repeats. OpenAI’s new WebSocket mode keeps a persis(developers.openai.com)g new input items plus a `previous_response_id` instead of rebuilding the whole turn each time. (developers.openai.com) ### Why does that help so much? Because a lot of agent latency is not “the model thinking.” It’s transport overhead — setting up requests, resending context, and waiting through repeated client-server handshakes. WebSocket mode cuts that continuation overhead by keeping the session alive and sending incremental inputs. OpenAI says that in rollouts wi(developers.openai.com) big number, but notice the scope — it is about long, tool-heavy chains, not every prompt. (developers.openai.com) ### Is this the same as the Realtime API? No — and that distinction matters. OpenAI already had WebSockets in its Realtime API for audio and live conversational systems. This new piece brings WebSocket transport to the Responses API, which is the more general workhorse for text, images, tool use, and conversation state. So this is less about voice ch(developers.openai.com) bounce between model reasoning and external tools over and over. (developers.openai.com) ### What does the developer flow look like? The client opens a WebSocket connection, sends a `response.create` event, and then listens for server events as the response unfolds. OpenAI says the payload mostly mirrors the normal Responses create body, except transport-specific fields like `stream` and `background` are not used here. There is also (developers.openai.com)t generated turn can start faster. That is a pretty direct hint at the intended use case — systems that know another tool-heavy turn is coming. (developers.openai.com) ### What changes for infrastructure teams? A persistent socket shifts where you put your reliability logic. With plain HTTP, retries are naturally request-shaped. With WebSockets, teams have to think more about connection lifecycle, reconnection, event ordering, and idempotency around tool calls. Streaming also gets reframed: HTTP SSE is still the do(developers.openai.com)cy path for incremental, chained workflows. That means API gateways and agent runtimes may end up using both, depending on whether the app is mostly “generate once” or “loop many times.” (developers.openai.com) ### Does this make agents cheaper too? Indirectly, yes. OpenAI’s docs pitch lower latency, not a new price cut. But if an agent spends less wall-clock time stalled between tools, infrastructure gets used more efficiently and timeouts become less painful. The savings show up in smoother production behavior — fewer long hangs, less orchestration w(developers.openai.com)ve from single prompts to long-running agents. (developers.openai.com) ### What is the real takeaway? The interesting part is not “OpenAI supports WebSockets” by itself. It is that agent performance is now bottlenecked enough by networking overhead that transport choices are becoming a product feature. Faster models still matter. Better prompting still matters. But once your app does dozens of model-tool roundtrips, the wire protocol starts to matter too. OpenAI just made that explicit. (developers.openai.com)