Google launches Gemini 3.1 TTS

Google AI released Gemini 3.1 Flash TTS, a text‑to‑speech model that supports more than 70 languages and uses audio tags to control delivery style, pace, and other vocal features. Google published a demo showing the model’s expressive control options. (x.com)

Text-to-speech turns written words into spoken audio, and Google on April 15 introduced Gemini 3.1 Flash TTS as its newest model for that job. (blog.google) Google said the model is rolling out in preview through the Gemini Application Programming Interface and Google Artificial Intelligence Studio, with enterprise access in preview on Vertex Artificial Intelligence and a Workspace version in Google Vids. (blog.google) The company said Gemini 3.1 Flash TTS supports more than 70 languages and adds “audio tags,” text instructions that let users steer pace, delivery, and vocal style. (blog.google) Google’s developer documentation describes the preview model as low-latency, meaning it is built to return speech quickly, and says it is designed for precise narration control rather than open-ended live conversation. (ai.google.dev) That split matters for developers building products like audiobooks, podcasts, training videos, and customer prompts, where the system has to read exact text in a chosen voice instead of improvising a reply. (ai.google.dev) Google said the model can generate single-speaker or multi-speaker audio, and its Gemini Application Programming Interface guide says multi-speaker output currently supports up to two speakers. (blog.google) (ai.google.dev) The company also said every audio output includes SynthID, Google’s watermarking system for marking media as Artificial Intelligence-generated. (blog.google) Google pointed to third-party benchmark firm Artificial Analysis, which it said gave Gemini 3.1 Flash TTS an Elo score of 1,211 on its text-to-speech leaderboard. Google did not publish pricing in the announcement post. (blog.google) The launch extends Google’s broader push to make Gemini a voice platform as well as a text model. Two weeks earlier, Google introduced Gemini 3.1 Flash Live, a separate audio model for real-time dialogue. (blog.google) For now, Google is framing Gemini 3.1 Flash TTS as a preview tool for developers and businesses that want tighter control over how machine speech sounds, one line of text at a time. (ai.google.dev)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.