Gemini 3.1 Flash TTS

Google DeepMind released Gemini 3.1 Flash TTS on April 15 — a text‑to‑speech model with Audio Tags to steer delivery, natural voices across 70+ languages, and SynthID watermarking. (x.com)(x.com) The feature is in preview through the Gemini API, AI Studio, Vertex AI and Google Vids, and early posts highlight granular style control and published prompt guides. (x.com)

Text-to-speech software turns written words into synthetic voices; Google DeepMind on April 15 added a new version called Gemini 3.1 Flash TTS with more control over how those voices sound. (blog.google) Google said the model is in preview and can be tested in Google AI Studio, the Gemini application programming interface, Vertex AI, and Google Vids. The company’s model catalog lists it as “Powerful, low-latency speech generation” with steerable prompts and new audio tags. (blog.google) (ai.google.dev) Those audio tags work like stage directions for a voice actor: developers can describe pace, tone, accent, and delivery in natural language instead of tuning a long list of sliders. Google’s speech-generation guide says the system can produce single-speaker or multi-speaker audio and lets users guide style, accent, pace, and tone directly in the prompt. (ai.google.dev) Google said Gemini 3.1 Flash TTS supports more than 70 languages, up from 24 languages in the earlier Gemini 2.5 text-to-speech previews Google described in 2025. That widens the company’s push from assistant-style audio toward narration, dubbing, and multilingual media production. (blog.google) (developers.googleblog.com) The company also said every clip generated with the model carries a SynthID watermark, Google’s marker for identifying artificial intelligence-made media. Google framed that as a safeguard against misinformation as synthetic speech gets easier to produce at scale. (blog.google) Google Vids is one of the first products to pick up the update. In a Workspace update published April 15, Google said Vids now includes 30 new conversational voice options and supports those voiceovers in 24 languages. (workspaceupdates.googleblog.com) Google’s developer docs also published a prompting guide for speech generation, with examples that treat the prompt like a director’s brief for a virtual voice talent. That points to the audience Google is chasing here: developers and media teams that want consistent delivery without recording fresh human voice sessions for every revision. (ai.google.dev 1) (ai.google.dev 2) The release lands as Google folds more audio features into the broader Gemini family rather than keeping speech in a separate product lane. For now, Gemini 3.1 Flash TTS is still labeled preview, so the immediate test is whether developers adopt those prompt-based controls in real production tools. (ai.google.dev) (docs.cloud.google.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.