xAI launches Grok voice APIs

- xAI launched standalone Grok speech‑to‑text and text‑to‑speech APIs aimed at enterprise voice developers. - The new APIs let developers integrate dedicated voice transcription and synthesis separate from chat models. - The launch is an example of the model market fragmenting into workflow‑specific services for voice and other tasks (marktechpost.com).

xAI said on April 17 it launched standalone Grok speech-to-text and text-to-speech application programming interfaces, expanding its voice tools beyond its live voice agent product. (x.ai) Speech-to-text turns spoken audio into written text, and text-to-speech turns written text into synthetic audio. xAI’s new endpoints are separate services at `/v1/stt` and `/v1/tts`, while its existing real-time voice agent runs at `/v1/realtime`. (docs.x.ai) The speech-to-text service supports batch uploads and live WebSocket streaming, plus word-level timestamps, multichannel transcription, and formatting that converts spoken numbers and currencies into written form. xAI’s docs list 25 supported languages for transcription formatting and a 500 megabyte file limit. (docs.x.ai) The text-to-speech service accepts up to 15,000 characters per request and offers five built-in voices: Eve, Ara, Rex, Sal, and Leo. xAI said developers can return audio in formats ranging from standard MP3 to telephony-oriented μ-law. (docs.x.ai) xAI priced speech-to-text at $0.10 per hour for batch jobs and $0.20 per hour for streaming, according to its April 17 launch post. The company said the models run on the same internal stack used for Grok Voice, Tesla vehicles, and Starlink customer support. (x.ai) The launch extends a voice push xAI started on December 17, 2025, when it released the Grok Voice Agent application programming interface for full speech-to-speech conversations. In that earlier release, xAI said standalone transcription and synthesis tools would follow “in the next few weeks.” (x.ai) Selling speech recognition and speech generation as separate building blocks gives developers a narrower option than a full conversational agent. That fits common enterprise uses like call transcription, accessibility tools, podcasts, and phone systems that need only one part of the voice stack. (x.ai) xAI is also pitching the package on enterprise controls rather than consumer chat features. Its voice documentation lists SOC 2 Type II auditing, Health Insurance Portability and Accountability Act eligibility, General Data Protection Regulation compliance, data residency options, and single sign-on with role-based access controls. (docs.x.ai) The result is a broader Grok lineup: one service for live voice agents, one for transcription, and one for speech generation. For buyers comparing vendors in April 2026, xAI is no longer selling voice only as a chatbot feature. (docs.x.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.