xAI Grok-voice outperforms Gemini

- xAI said April 23 it released grok-voice-think-fast-1.0, a voice-agent model that topped Sierra’s τ-voice benchmark ahead of Google Gemini and OpenAI GPT Realtime. - xAI reported 67.3% overall pass@1, versus 43.8% for Gemini 3.1 Flash Live and 35.3% for GPT Realtime 1.5. - The launch targets phone agents and contact centers, with xAI pricing realtime voice at $0.05 per minute. (docs.x.ai)

Voice agents are artificial intelligence systems that listen, speak, and use software tools during a live call. On April 23, xAI said its new grok-voice-think-fast-1.0 now leads Sierra’s τ-voice benchmark. (arxiv.org) (x.ai) The model is xAI’s new flagship for real-time voice conversations and is available through the company’s Voice Agent application programming interface. xAI said it is built for customer support, sales, and other multi-step workflows that need precise data entry and tool use. (x.ai) (docs.x.ai) On xAI’s published results, grok-voice-think-fast-1.0 scored 67.3% on τ-voice, ahead of Gemini 3.1 Flash Live at 43.8%, Grok Voice Fast 1.0 at 38.3%, and GPT Realtime 1.5 at 35.3%. xAI also posted category scores of 62.3% in retail, 66% in airline, and 73.7% in telecom. (x.ai) τ-voice is a benchmark for full-duplex agents, meaning systems that can listen and speak at the same time instead of waiting for strict turn-taking. The paper says it measures task completion and interaction quality across 278 tasks under noise, accents, interruptions, and policy constraints. (arxiv.org) The benchmark matters because voice systems still lag text agents by a wide margin on real work. The τ-voice paper says voice agents reached 31% to 51% under clean conditions and 26% to 38% under realistic conditions, keeping only 30% to 45% of text capability. (arxiv.org) xAI said it built the model with Starlink and has already used the system in production phone workflows. The company said the model was tested on telephony audio, background noise, heavy accents, and frequent interruptions. (x.ai) For developers, the product is sold through xAI’s realtime voice stack over WebSocket, with support for tool calling during a live conversation. xAI’s docs list realtime voice pricing at $0.05 per minute, or $3 per hour. (docs.x.ai 1) (docs.x.ai 2) Sierra, which maintains τ-Bench, says its leaderboard is designed to show prompts, trajectories, and experiment details so outside researchers can inspect results. xAI’s headline numbers are company-reported, but the benchmark itself is public and community-submittable. (sierra.ai) (github.com) The immediate contest is shifting from chatbots that answer questions to phone agents that can finish tasks while people interrupt, correct, and change their minds mid-call. xAI is betting that benchmark lead will help it win those deployments. (arxiv.org) (x.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.