Mistral 8x22B instruct v2 praised

Published by The Daily Scout

What happened

- Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct remains one of the most commercially usable open-weight frontier models available. - Mistral’s model uses 141 billion total parameters with about 39 billion active per forward pass, under an Apache 2.0 license. - Mistral lists Mixtral 8x22B as retired, with Mistral Small 3.2 named as its replacement in current documentation.

Why it matters

Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct still stands out as an open-weight model that companies can use commercially without royalties, while warning that running it yourself remains expensive. His post pointed to the model’s sparse mixture-of-experts design, which has 141 billion total parameters but activates about 39 billion per forward pass. Mistral’s own documentation describes Mixtral 8x22B as an Apache 2.0-licensed open model with a 64,000-token context window and estimated GPU memory needs of roughly 283 GB at bf16 or about 71 GB at fp4. ### Why are people still talking about an April 2024 model in May 2026? Mistral released Mixtral 8x22B in April 2024 as the larger follow-up to its earlier open MoE models. In its launch post, the company said the model was “more capable than any other open-weight model” at the time and argued that sparse activation made it faster than dense 70B-class rivals for a given inference budget. (docs.mistral.ai) Scott Shapiro’s point was not that Mixtral 8x22B is Mistral’s newest model. His post framed it as a practical benchmark for buyers who care about open weights, permissive licensing and commercial deployability rather than only headline benchmark gains. That distinction matters because Mistral’s current model catalog now centers newer families, while still documenting Mixtral 8x22B as part of its open lineup history. (mistral.ai) ### What does “141B total, 39B active” actually mean? Mistral describes Mixtral 8x22B as a sparse mixture-of-experts model. In that design, the full parameter count is 141 billion, but only about 39 billion parameters are active on each forward pass, which is the figure Shapiro highlighted. Vercel’s model listing for the same model repeats that structure and pairs it with a 65.5K-token context window and Apache 2.0 licensing. (docs.mistral.ai) That helps explain why the model has remained attractive to developers: it offers a frontier-style open-weight architecture without the legal restrictions attached to some competing releases. ### Why does the Apache 2.0 license matter so much? (docs.mistral.ai) Mistral’s documentation lists both the base and instruct weights for Mixtral 8x22B under Apache 2.0. That license allows commercial use and modification without royalties, which is the core of Shapiro’s “commercially usable” argument. Hugging Face hosts the official Mistral repositories for Mixtral 8x22B and Mixtral 8x22B Instruct, which made the model easy to download, fine-tune and integrate into existing open-source tooling. (vercel.com) For companies that want to avoid per-seat or per-token licensing constraints, that distribution model remains part of the appeal. ### If it is open, why is self-hosting still hard? (docs.mistral.ai) Mistral’s model card says Mixtral 8x22B needs about 283 GB of GPU RAM at bf16 and about 71 GB at fp4. Those numbers are the clearest reason Shapiro warned that self-hosting still requires serious infrastructure even if the license is permissive. Hugging Face’s usage notes also point users toward lower-precision loading and other memory-saving techniques, which underlines the same constraint. (huggingface.co) Open weights remove licensing barriers, but they do not remove hardware costs, serving complexity or the engineering work needed to run a large MoE model in production. (docs.mistral.ai) ### Has Mistral moved on from this model? Mistral’s current model card marks Mixtral 8x22B as retired as of March 30, 2025, and names Mistral Small 3.2 as its replacement. The company still keeps the technical documentation online, which is why the model continues to come up in discussions about open-weight licensing and enterprise deployment trade-offs. (huggingface.co) Mistral’s broader models overview now emphasizes newer products including Mistral Medium 3.5 and Mistral Small 4. But Mixtral 8x22B remains a reference point for one specific question Shapiro raised on May 27: what an actually usable, royalty-free, frontier-adjacent open model looks like when the infrastructure bill is included. (docs.mistral.ai 1) (docs.mistral.ai 2)

Key numbers

  • Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct remains one of the most commercially usable open-weight frontier models available.
  • Mistral’s model uses 141 billion total parameters with about 39 billion active per forward pass, under an Apache 2.0 license.
  • Mistral lists Mixtral 8x22B as retired, with Mistral Small 3.2 named as its replacement in current documentation.
  • Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct still stands out as an open-weight model that companies can use commercially without royalties, while warning that running it yourself remains expensive.

What happens next

  • Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct still stands out as an open-weight model that companies can use commercially without royalties, while warning that running it yourself remains expensive.
  • Why are people still talking about an April 2024 model in May 2026?
  • In its launch post, the company said the model was “more capable than any other open-weight model” at the time and argued that sparse activation made it faster than dense 70B-class rivals for a given inference budget.

Quick answers

What happened in Mistral 8x22B instruct v2 praised?

Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct remains one of the most commercially usable open-weight frontier models available. Mistral’s model uses 141 billion total parameters with about 39 billion active per forward pass, under an Apache 2.0 license. Mistral lists Mixtral 8x22B as retired, with Mistral Small 3.2 named as its replacement in current documentation.

Why does Mistral 8x22B instruct v2 praised matter?

Scott Shapiro said on May 27 that Mistral’s Mixtral 8x22B Instruct still stands out as an open-weight model that companies can use commercially without royalties, while warning that running it yourself remains expensive. His post pointed to the model’s sparse mixture-of-experts design, which has 141 billion total parameters but activates about 39 billion per forward pass. Mistral’s own documentation describes Mixtral 8x22B as an Apache 2.0-licensed open model with a 64,000-token context window and estimated GPU memory needs of roughly 283 GB at bf16 or about 71 GB at fp4. Why are people still talking about an April 2024 model in May 2026? Mistral released Mixtral 8x22B in April 2024 as the larger follow-up to its earlier open MoE models. In its launch post, the company said the model was “more capable than any other open-weight model” at the time and argued that sparse activation made it faster than dense 70B-class rivals for a given inference budget. (docs.mistral.ai) Scott Shapiro’s point was not that Mixtral 8x22B is Mistral’s newest model. His post framed it as a practical benchmark for buyers who care about open weights, permissive licensing and commercial deployability rather than only headline benchmark gains. That distinction matters because Mistral’s current model catalog now centers newer families, while still documenting Mixtral 8x22B as part of its open lineup history. (mistral.ai) What does “141B total, 39B active” actually mean? Mistral describes Mixtral 8x22B as a sparse mixture-of-experts model. In that design, the full parameter count is 141 billion, but only about 39 billion parameters are active on each forward pass, which is the figure Shapiro highlighted. Vercel’s model listing for the same model repeats that structure and pairs it with a 65.5K-token context window and Apache 2.0 licensing. (docs.mistral.ai) That helps explain why the model has remained attractive to developers: it offers a frontier-style open-weight architecture without the legal restrictions attached to some competing releases. Why does the Apache 2.0 license matter so much? (docs.mistral.ai) Mistral’s documentation lists both the base and instruct weights for Mixtral 8x22B under Apache 2.0. That license allows commercial use and modification without royalties, which is the core of Shapiro’s “commercially usable” argument. Hugging Face hosts the official Mistral repositories for Mixtral 8x22B and Mixtral 8x22B Instruct, which made the model easy to download, fine-tune and integrate into existing open-source tooling. (vercel.com) For companies that want to avoid per-seat or per-token licensing constraints, that distribution model remains part of the appeal. If it is open, why is self-hosting still hard? (docs.mistral.ai) Mistral’s model card says Mixtral 8x22B needs about 283 GB of GPU RAM at bf16 and about 71 GB at fp4. Those numbers are the clearest reason Shapiro warned that self-hosting still requires serious infrastructure even if the license is permissive. Hugging Face’s usage notes also point users toward lower-precision loading and other memory-saving techniques, which underlines the same constraint. (huggingface.co) Open weights remove licensing barriers, but they do not remove hardware costs, serving complexity or the engineering work needed to run a large MoE model in production. (docs.mistral.ai) Has Mistral moved on from this model? Mistral’s current model card marks Mixtral 8x22B as retired as of March 30, 2025, and names Mistral Small 3.2 as its replacement. The company still keeps the technical documentation online, which is why the model continues to come up in discussions about open-weight licensing and enterprise deployment trade-offs. (huggingface.co) Mistral’s broader models overview now emphasizes newer products including Mistral Medium 3.5 and Mistral Small 4. But Mixtral 8x22B remains a reference point for one specific question Shapiro raised on May 27: what an actually usable, royalty-free, frontier-adjacent open model looks like when the infrastructure bill is included. (docs.mistral.ai 1) (docs.mistral.ai 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.