Voice APIs for agents
- xAI launched standalone Grok speech-to-text and text-to-speech APIs aimed at enterprise voice developers. - Ring-a-Ding released an OpenClaw skill that lets AI agents make phone calls for workflows. - Together these announcements push voice from demo tech toward programmable agent interfaces for scheduling, reminders, and collections ( ).
Voice software is being packaged into building blocks that other AI systems can call, not just chat with. xAI launched standalone speech APIs on April 17, and Ring-a-Ding launched an OpenClaw phone-calling skill on April 18. (x.ai) (markets.businessinsider.com) Speech-to-text turns spoken words into text; text-to-speech does the reverse and reads text aloud. xAI said its new Grok endpoints are built on the same voice stack used in Grok Voice, Tesla vehicles, and Starlink customer support. (x.ai) xAI said the new endpoints are aimed at voice agents, real-time transcription, accessibility tools, podcasts, and interactive audio apps. Its developer docs also show a real-time voice agent setup over WebSocket, plus separate `/v1/stt` and `/v1/tts` endpoints for transcription and speech generation. (x.ai) (docs.x.ai) Phone calling is a separate layer: an agent has to decide what to do, place a call through a carrier, handle replies, and log the result. OpenClaw describes its system as a self-hosted gateway that connects chat apps and other channels to AI agents, with “skills” used to teach the agent when and how to use tools. (docs.openclaw.ai 1) (docs.openclaw.ai 2) Ring-a-Ding said its OpenClaw skill lets agents make outbound calls for tasks including requesting quotes, booking appointments, and checking availability. The April 18 release also mentioned reminders, collections, and escalation calls as target workflows. (markets.businessinsider.com) OpenClaw already documents a separate voice-call plugin for outbound notifications and multi-turn conversations, with Twilio, Telnyx, Plivo, and a mock provider listed as current options. That means the new Ring-a-Ding release is landing in an ecosystem that already has the plumbing for live calls. (docs.openclaw.ai) xAI is also pitching these voice tools to larger companies, not just hobby developers. Its voice API docs list SOC 2 Type II controls, Health Insurance Portability and Accountability Act eligibility, General Data Protection Regulation compliance, data residency options, and single sign-on and role-based access controls. (docs.x.ai) The common thread is that voice is being split into programmable layers: one service handles listening and speaking, while another layer decides when to call and what job to complete. That setup fits routine workflows where a missed appointment, unpaid bill, or unanswered scheduling request still depends on a phone line. (x.ai) (markets.businessinsider.com)