Mistral launches Voxtral TTS
Mistral released Voxtral, a 3B open‑weight TTS model that supports expressive speech in 9 languages, offers low latency, and can clone a voice from just 3 seconds of audio — early benchmarks claim it outperforms ElevenLabs in some tasks. The release is pitched for real‑time voice use cases and developer integration. (x.com)
Mistral published the Voxtral release on March 26, 2026 and distributed the model weights under the permissive Apache 2.0 license. (ctol.digital) The Voxtral family includes an edge-focused 3B variant and a larger 24B production variant, both offered by Mistral for different deployment profiles. (mistral.ai) Mistral’s technical paper reports blind human-preference evaluations in which annotators preferred Voxtral in 58.3% and 68.4% of the tested cases for key synthesis and cloning tasks. (mistral.ai) A companion arXiv submission, “Voxtral Realtime” (arXiv:2602.11298), describes an end-to-end streaming ASR architecture trained for sub-second, natively streaming transcription. (arxiv.org) Model artifacts and cards are hosted on Hugging Face under the mistralai organization, and Mistral has published API docs and realtime-transcription examples for developers. (huggingface.co) Early media coverage positions Voxtral as a direct competitor to proprietary voice vendors such as ElevenLabs, Deepgram and OpenAI, with outlets highlighting Mistral’s open-weight distribution as a strategic differentiator. (techcrunch.com) Press and technical write-ups cite a local deployment footprint of roughly 3GB of RAM for the larger Voxtral variants, which Mistral and reporters say enables on-device use on smartphones, laptops and wearables. (ctol.digital)