Free TTS from Gemini?
- A new Gemini TTS demo shows ultra‑realistic, controllable voice generation that some creators call disruptive. - The demo video claims the new model is highly directable and widely accessible at low or no cost. - The YouTube walkthrough and reaction suggest cheaper, higher‑quality TTS could pressure specialist voice vendors (youtube.com)
Text-to-speech is software that turns written words into spoken audio, and Google this month put a new Gemini version into preview with tighter control over how that voice sounds. On April 15, Google said Gemini 3.1 Flash TTS was rolling out in preview through the Gemini API, Google AI Studio, Vertex AI and Google Vids. (blog.google) Google says the model can generate single-speaker or two-speaker audio from text and lets users steer style, accent, pace and tone with plain-language instructions embedded in the prompt. The Gemini API documentation says the TTS system is designed for exact text recitation, such as podcasts and audiobooks, rather than live back-and-forth conversation. (ai.google.dev) The company’s April 15 post says Gemini 3.1 Flash TTS supports more than 70 languages and adds “audio tags” for finer control over delivery. Google also says the model watermarks generated audio with SynthID, its provenance system for artificial-intelligence media. (blog.google) Google’s own pricing page says developers can start with free input and output tokens under the free tier, then move to paid usage with higher rate limits. The same page lists Gemini 3.1 Flash TTS under the Gemini Developer API pricing table, which is where the “free” claim in creator demos appears to come from. (ai.google.dev) Google Cloud’s documentation describes Gemini-TTS as the latest version of its text-to-speech stack and says the 3.1 Flash TTS preview is optimized for low-latency, controllable speech generation. That places it directly in the market for narration, explainers, training videos and other jobs that have often used specialist voice vendors. (cloud.google.com) Google is not the first company to pitch more expressive synthetic voices, but its latest release bundles that capability into the broader Gemini platform that many developers already use for text, image and live audio work. Google’s model catalog lists both Gemini 3.1 Flash TTS Preview and older 2.5 Flash and Pro TTS preview models, showing the company has been iterating on speech inside the same family. (ai.google.dev) The current limitations are also in Google’s documentation. The TTS guide says the feature is still in Preview, accepts text-only input, produces audio-only output, and does not support streaming in this mode. (ai.google.dev) Google’s April 15 announcement also cites an Artificial Analysis leaderboard score of 1,211 and says the model sits in that firm’s “most attractive quadrant” for quality and cost. That is Google’s evidence for the claim that the new model is both stronger and cheaper than earlier options, though outside buyers will still judge it on consistency, licensing terms and production reliability. (blog.google) The immediate shift is not that voice actors or dedicated speech startups disappear overnight. It is that a major platform company has moved high-control synthetic speech closer to the default developer toolkit, with a preview product Google is already offering across consumer, developer and enterprise channels. (blog.google)