Google ships Gemini 3.1 TTS Preview
Google launched Gemini 3.1 Flash TTS in preview with support for more than 70 languages, about 30 voices, and built‑in watermarking aimed at enterprise use cases like accessibility and media. The feature arrived as a preview release rather than a full public launch, reflecting an emphasis on broad language coverage and content provenance. (x.com)
Google rolled out Gemini 3.1 Flash TTS on April 15 as a preview speech model for developers, enterprise customers and Google Vids users. (blog.google) Text-to-speech turns written words into audio, and Google says this version lets users steer pace, tone and delivery with natural-language “audio tags” embedded in the prompt. Google published the model in Google AI Studio, the Gemini Application Programming Interface and Vertex AI rather than as a full general release. (blog.google) (ai.google.dev) The company says Gemini 3.1 Flash TTS supports more than 70 languages and adds SynthID, Google’s watermarking system for identifying artificial intelligence-generated audio. Google also says the model is built for low-latency output, meaning it is tuned to respond quickly enough for production apps. (blog.google) (ai.google.dev) Google is pitching the model for uses that need exact recitation rather than free-form conversation, including podcasts, audiobooks, accessibility tools and narrated media. Its Gemini API documentation draws that line directly: the Live API is for back-and-forth voice agents, while TTS is for reading supplied text with tight control. (ai.google.dev) (blog.google) That split has become more important as Google expands Gemini’s audio stack. In March, Google launched Gemini 3.1 Flash Live Preview for real-time dialogue, and this week it added a separate TTS model for scripted speech, giving developers one model for conversation and another for narration. (blog.google) (ai.google.dev) (blog.google) Google says developers can generate single-speaker or two-speaker audio, choose from prebuilt voices and save voice settings for repeat use in Google AI Studio. Cloud Text-to-Speech release notes say Gemini-TTS supports 30 voices and more than 70 locales across Google’s broader speech platform. (ai.google.dev) (blog.google) (docs.cloud.google.com) Google is also leaning on benchmark positioning. The company said Gemini 3.1 Flash TTS scored 1,211 on the Artificial Analysis text-to-speech leaderboard, which Google described as a blind human-preference benchmark for speech quality and cost. (blog.google) The preview label sets the practical limit for now. Google’s own model pages and Cloud terms say preview features can change before general availability, so the launch looks less like a consumer debut than a developer test of how far broad language coverage, voice control and watermarking can travel together. (ai.google.dev) (docs.cloud.google.com)