Grok speech API launch

- xAI launched Grok's Speech to Text API, offering instant multi‑speaker transcription across 25 languages. - The API is positioned as competitively priced and targets real-time transcription use cases. - xAI framed the release as a product push to broaden Grok's utility beyond chat-based assistants. (x.com)

xAI has launched a standalone Grok speech-to-text application programming interface, pushing the company deeper into real-time audio tools for developers. (x.ai) The release was announced April 17, 2026, alongside a separate text-to-speech product. xAI said the speech-to-text service handles batch uploads through a REST endpoint and live transcription through a WebSocket endpoint. (x.ai) Speech-to-text software turns spoken audio into written text, and xAI is selling that as an infrastructure product rather than only as a feature inside Grok. The company’s docs list support for 25 languages, speaker diarization that labels who said what, multichannel audio, and word-level timestamps. (docs.x.ai) xAI set the price at $0.10 per hour for batch transcription and $0.20 per hour for streaming. Its pricing page also lists Speech to Text under the company’s broader Voice and Audio lineup, alongside a Voice Agent API priced at $3.00 per hour and text-to-speech priced at $4.20 per 1 million characters. (x.ai) (docs.x.ai) The launch gives xAI a direct product for call centers, meeting transcription, accessibility software, and voice agents that need text in real time. xAI said the same audio stack already powers Grok Voice, Tesla vehicles, and Starlink customer support. (x.ai) That matters because xAI has been widening Grok from a chatbot into a broader developer platform with models, image tools, search, and voice products. On its API site, xAI says developers can build with speech-to-text, text-to-speech, and conversational voice through the same platform. (x.ai) xAI is also pitching ease of adoption. Its API site says the platform is compatible with OpenAI and Anthropic software development kits, and its voice docs show a single-file upload example for `/v1/stt`. (x.ai) (docs.x.ai) The company is making an enterprise sales case too. xAI’s voice documentation lists SOC 2 Type II controls, Health Insurance Portability and Accountability Act eligibility, General Data Protection Regulation compliance, data residency options, and single sign-on with role-based access controls. (docs.x.ai) xAI’s launch post compares Grok’s transcription pricing with AssemblyAI, ElevenLabs, and Deepgram, and says Grok performed better on its own word error rate tests across phone calls, meetings, and telephony. Those benchmark claims come from xAI’s materials, not an independent comparison published with the launch. (x.ai) For now, the move is less about a new chatbot feature than about selling Grok as plumbing. xAI is putting speech recognition on the same menu as its models, search, and voice agents, with pricing and endpoints ready for developers to plug in. (x.ai)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.