OpenAI Releases Low-Latency WebSocket API

OpenAI has launched a new Responses API that uses WebSockets to enable low-latency, streaming interactions with its models. The API is designed for building tool-heavy agents and applications that require real-time, responsive communication. This provides developers with a more efficient alternative to traditional polling-based methods for streaming responses.

- The WebSocket API is part of OpenAI's broader "Responses API," which is designed to be a more capable and flexible successor to the older Completions API. This new API framework is intended to eventually replace the legacy Assistants API, with a migration guide for developers expected in the near future. - The primary advantage of the WebSocket mode is the reduction of per-turn overhead in long-running, multi-step agentic workflows. For complex tasks involving 20 or more tool calls, OpenAI has observed up to a 40% faster end-to-end execution time compared to traditional polling-based methods. - Under the hood, the WebSocket connection maintains a persistent, stateful channel that allows for incremental inputs. Instead of resending the entire conversation history with each turn, developers can send only the new input along with a `previous_response_id`, which the service uses to retrieve the state from an in-memory cache for faster processing. - The connection has a hard limit of 60 minutes, after which a reconnection is required. A single WebSocket connection handles messages sequentially, so for parallel runs, multiple connections are necessary as there is no multiplexing support. - This move toward real-time, stateful interaction is a broader trend in the LLM space. For instance, Google's Gemini API offers a "Live API" for real-time streaming sessions in agentic workflows, and Anthropic provides fine-grained streaming for tool use parameters to reduce latency, though they utilize Server-Sent Events (SSE). - For developers, there's a "warm-up" feature where you can send a `response.create` event with `generate: false`. This prepares the request state, including tools and instructions, so that the subsequent generated turn can start more quickly. - While OpenAI's `stream=true` parameter in the REST API uses Server-Sent Events (SSE) for one-way streaming, the WebSocket API enables true full-duplex, bidirectional communication, which is more efficient for the back-and-forth nature of tool-heavy agents.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.