Mistral’s big model moves
Mistral released a free multimodal model with claimed 40% speed gains and a 256K context window, and announced Mistral Large 3 (675B MoE) that reportedly hits 92% of GPT‑5.2 performance at 15% of the cost—an efficiency play for open models. Those launches position Mistral as a serious open‑stack alternative for chat, coding, reasoning, and vision workloads. (x.com; x.com)
Mistral Small 4 was announced on March 16, 2026 and intentionally replaces the separate Magistral, Pixtral and Devstral families with a single deployment target. (marktechpost.com) The model’s MoE design is built around 128 experts with four active experts per token, totals roughly 119 billion parameters, and activates about 6 billion parameters per token at inference; the release also adds a per-request reasoning_effort toggle to vary inference behavior. (mistral.ai) Mistral published explicit hardware guidance for production — recommending multi‑GPU HGX/DGX configurations — and shipped optimized checkpoints and runtimes that are immediately compatible with vLLM, Transformers, llama.cpp and NVIDIA’s NIM. (mistral.ai) Mistral Large 3, announced as part of the Mistral 3 family in December 2025, was trained from scratch on roughly 3,000 NVIDIA H200 GPUs and is released under an Apache 2.0 license in both base and instruction‑fine‑tuned checkpoints. (blogs.nvidia.com) That Large 3 checkpoint uses a sparse MoE layout with about 41 billion active parameters inside a 675 billion‑parameter weight table, and Mistral published NVFP4‑compressed checkpoints with guidance to run a tuned instance on a single 8×A100 or 8×H100 node using vLLM. (mistral.ai) NVIDIA’s partnership covered low‑precision kernels and Blackwell attention / MoE optimizations plus prefill/decode disaggregated serving on GB200 NVL72 systems, while Mistral partnered publicly with vLLM and Red Hat to accelerate inference and deployment tooling. (blogs.nvidia.com) The Small 4 launch shipped alongside ecosystem pieces — including Leanstral (a Lean 4–based code agent) — and day‑one availability across Hugging Face, the Mistral API and NVIDIA’s NIM catalog for immediate prototyping. (creativeainews.com)