Gemini 3.1 Flash TTS released

Google announced Gemini 3.1 Flash TTS, a generative speech model that adds controllability through 'Audio Tags' for style and pacing. The release is being framed as a step toward more expressive, controllable synthetic voices. ( )

Google rolled out Gemini 3.1 Flash TTS on April 15, adding a new text-to-speech model that lets users direct how an artificial voice sounds and paces its delivery. (blog.google) Text-to-speech systems turn written words into audio, and Google said this version adds “audio tags,” which are plain-language cues embedded in the text to steer style, pace, and delivery. The company released it in preview through the Gemini application programming interface and Google AI Studio, with preview access on Vertex AI and availability in Google Vids. (ai.google.dev ) (blog.google) Google said Gemini 3.1 Flash TTS supports more than 70 languages and native multi-speaker dialogue, so one prompt can generate back-and-forth speech without stitching together separate clips. The company also said all output audio is watermarked with SynthID, its provenance tool for marking artificial intelligence-generated media. (blog.google) The release lands as technology companies push synthetic voices beyond flat narration and toward something closer to voice direction, where a developer can ask for a whisper, a pause, or a faster line reading inside the same script. Google described the model as low-latency, meaning it is built to return audio quickly enough for products like assistants, videos, and interactive apps. (ai.google.dev) (blog.google) Google tied the launch to benchmark results as well as product distribution. In its announcement, the company said Artificial Analysis gave the model an Elo score of 1,211 on its text-to-speech leaderboard and placed it in a high-quality, low-cost quadrant. (blog.google) The developer documentation lists the preview model as `gemini-3.1-flash-tts-preview` with text input, audio output, an input token limit of 8,192, and an output token limit of 16,384. Google’s DeepMind model card says the broader Gemini 3.1 Flash Audio family was published in March 2026 and updated in April 2026. (ai.google.dev) (deepmind.google) That puts the new model inside a wider Gemini audio push rather than as a standalone voice product. Google’s model card says Gemini 3.1 Flash Audio is based on Gemini 3 Pro and is distributed through Google AI Studio, the Gemini application programming interface, Vertex AI, and Google Vids. (deepmind.google) For developers, the practical change is that voice instructions now sit inside the script instead of in a separate settings panel. For Google, the pitch is that synthetic speech can be more directed, more multilingual, and more traceable at the moment it is generated. (blog.google)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.