Mistral ships Medium 3.5

- Mistral released Medium 3.5 on April 30, folding chat, reasoning, coding, and vision into one 128B open-weight flagship model. - The headline number is 77.6% on SWE-Bench Verified, with a 256k context window and public weights plus an EAGLE draft model. - It matters because Mistral is collapsing three product lines into one checkpoint that enterprises can self-host without giant MoE infrastructure.

Foundation models keep splitting into specialist branches — one for chat, one for coding, one for long-form reasoning, one for vision. Mistral just went the other way. On April 30, it shipped Medium 3.5, a 128B dense flagship that tries to do all of those jobs from one set of open weights. That matters because the real fight now is not just benchmark scores. It’s deployment friction. ### What actually shipped? Medium 3.5 is Mistral’s new top “medium” model, but the name undersells it a bit. This is a 128B dense multimodal model with a 256k-token context window, configurable reasoning mode, and support for text plus images. Mistral put the weights on Hugging Face and also released an EAGLE speculative-decoding variant meant to speed up inference in latency-sensitive setups. ### Why is the “one model” angle the real story? Because Mistral is basically retiring a bunch of specialized branches into one checkpoint. Medium 3.5 replaces Mistral Medium 3.1 and Magistral in Le Chat, and it replaces Devstral 2 in Vibe. NVIDIA’s model docs describe it as a merged lineage of Medium 3.1, Magistral Medium, and Devstral 2. So this is not just a version bump — it’s product consolidation. ### How good is it on coding? The number Mistral wants you to notice is 77.6% on SWE-Bench Verified. On its own page, the company also says Medium 3.5 beats its previous coding models, including Devstral, and posts 91.4% on τ³-Telecom. Benchmarks are never the whole story, but 77.6% on SWE-Bench Verified puts the model squarely in the serious-agent conversation rather than the “pretty good open model” bucket. ### Why does dense matter here? A lot of frontier-ish open models now lean on mixture-of-experts designs. Those can be very efficient at runtime, but they also add serving complexity. Mistral went with a dense 128B model instead. NVIDIA highlights the footprint angle directly — native FP8 lets the full model fit in an H200 node or 2× H100s, which Mistral is pitching “strong enough, but simpler to run.” ### What’s the deployment catch? The model is open-weight, not frictionless. The Hugging Face repo shows the original FP8 weights, and the full checkpoint is huge. Still, the practical story is better than the raw parameter count suggests. Mistral says vLLM is the recommended backend, llama.cpp supports a pretty clear path from “interesting release” to “internal pilot.” ### Where does this show up first? In Mistral’s own products. The company made Medium 3.5 the default model in Vibe and Le Chat, and paired the release with remote agents in Vibe plus a new Work mode in Le Chat for multi-step tasks. That’s useful context because Mistral is not shipping a raw model and hoping the ecosystem figures it out. It is using the model to simplify its own stack first. ### Is this really “open”? Open-weight, yes. Fully open-source in the strictest sense, not quite. The Hugging Face listing shows a custom “other” license, and outside summaries describe it as modified MIT rather than a plain OSI-standard release. So developers can inspect and self-host the weights, but anyone treating this like a no-strings community release should read the license terms carefully. ### Bottom line? The interesting part is not that Mistral made another big model. Everybody does that now. The interesting part is that it used Medium 3.5 to collapse chat, coding, reasoning, and vision into one deployable flagship — and then wrapped that model in enough tooling that companies can plausibly run it themselves than “largest.”

Mistral ships Medium 3.5

Get your own daily briefing