Mistral ships Medium 3.5 and Voxtral 2
- Mistral has rolled out two product layers at once — Medium 3.5 for agentic coding and reasoning, and Voxtral Transcribe 2 for speech. - The sharpest detail is consolidation: one 128B open-weight model now replaces Mistral’s prior medium, reasoning, and coding specialists across Le Chat and Vibe. - That matters because Mistral is no longer just shipping base models — it is stitching models into usable work and voice systems.
Mistral just made two related moves. One is a new flagship text-and-image model for coding, reasoning, and agent workflows. The other is a speech stack for batch transcription and live audio. Put together, the point is pretty clear — Mistral wants to be judged less as a model lab and more as a company shipping usable AI systems. ### What actually shipped? The first release is Mistral Medium 3.5. Mistral describes it as a frontier-class multimodal model for agentic and coding use cases, with text and image input, text output, a 256k context window, and configurable reasoning effort. It is also open weights under Mistral’s Modified MIT license. The second release is Voxtral Transcribe 2, which comes as two speech-to-text models: Voxtral Mini Transcribe V2 for batch jobs and Voxtral Realtime for live transcription. (mistral.ai) ### Why is Medium 3.5 the bigger deal? Because it is not just a better medium model. It collapses several roles into one set of weights. Mistral says Medium 3.5 replaces Medium 3.1 and Magistral in Le Chat, and replaces Devstral 2 inside Vibe. So the real news is simplification — one model for regular chat, deliberate reasoning, and coding agents, instead of a small internal maze of specialist models. (docs.mistral.ai) ### What does “configurable reasoning” change? Basically, it lets the same model behave like two products. For a quick answer, you can keep reasoning light and get speed. For a harder task, you can turn up reasoning effort and spend more test-time compute. That matters for agents, because agent workloads swing between cheap steps — like parsing instructions — and expensive steps, like planning or debugging code. One model that stretches across both is easier to deploy and easier to price. (mistral.ai) ### What is Vibe doing here? Vibe is Mistral’s coding-agent product, and Medium 3.5 is now the engine under its new remote agents. Those agents run in the cloud, in parallel, instead of sitting only on a local machine. Mistral also added a Work mode in Le Chat for more complex tasks. That is the product wrapper around the model launch — not just “here are weights,” but “here is where the weights do paid work.” (docs.mistral.ai) ### And what is special about Voxtral 2? The split is the story. Voxtral Mini Transcribe V2 is aimed at production transcription — diarization, timestamps, context biasing, and 13 languages. Voxtral Realtime is aimed at live use, with configurable latency down to sub-200 milliseconds. Mistral also says the realtime model is open weights under Apache 2.0, which makes it easier for developers to self-host or adapt for voice agents and live captioning. (mistral.ai) ### Why bundle text agents and speech now? Because the market has moved past single benchmark wins. Buyers want end-to-end workflows — coding agents that can run jobs, chat products that can handle multi-step work, and voice systems that can transcribe cheaply enough to use in production. Medium 3.5 and Voxtral 2 cover those layers from two sides: text reasoning on one side, speech input on the other. (mistral.ai) ### Is the open-weight angle still important? Yes — especially here. Medium 3.5 is open weights with a Modified MIT license, and Voxtral Realtime is Apache 2.0. That does not make them “fully unrestricted” in every practical sense, but it does keep Mistral differentiated from labs that ship only closed APIs. For companies that care about self-hosting, cost control, or deployment flexibility, that is still a real lever. (mistral.ai) ### So what is the bottom line? The interesting part is not that Mistral launched one more model. Everyone does that. The interesting part is that Mistral is merging model categories on the text side and pairing them with a production-ready voice layer on the audio side. That is a shift from model catalog thinking to stack thinking — and in 2026, that is where a lot of the real competition is. (mistral.ai) (docs.mistral.ai)