Mistral launches Voxtral
Mistral AI released Voxtral, an open‑weight TTS model that supports expressive speech in nine languages and targets low latency use cases like customer support and translation. The launch positions Voxtral as a production‑ready multimodal voice option for enterprise deployments. (x.com)
Mistral’s release page presents Voxtral TTS as a “lightweight” 4‑billion‑parameter model, while technical coverage breaks the architecture down into a 3.4B‑parameter transformer decoder plus a 390M flow‑matching acoustic transformer and a 300M neural audio codec. (mistral.ai) Multiple reports say the model’s runtime footprint is roughly 3 GB of RAM, with a claimed time‑to‑first‑audio of about 90 milliseconds and generation speeds near 6× real time, enabling on‑device use cases on smartphones and even wearable targets. (aihola.com) Mistral and press coverage list nine supported languages — English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic — and demonstrate voice adaptation from under five seconds of reference audio that preserves accent when switching languages. (theaiinsider.tech) In company‑run human evaluations Voxtral reportedly beat ElevenLabs Flash v2.5, with Mistral citing a 62.8% listener preference on standard voices and 69.9% on customized voices; those preference numbers are self‑reported and have not been independently validated. (mistral.ai) Voxtral is being released as “open‑weight,” with Mistral offering testing in AI Studio and options to run weights on private servers or edge devices — part of a push by Mistral to give enterprises full control of voice stacks after its recent funding growth. (venturebeat.com) The TTS launch follows Mistral’s Voxtral Transcribe 2 rollout in February 2026 (batch and realtime ASR models), creating a complete, production‑oriented speech‑to‑speech pipeline that Mistral says supports low‑latency, multilingual voice agent deployments. (mlq.ai)