MiniMax opens M2.7 model

MiniMax open‑sourced its M2.7 model and published state‑of‑the‑art scores on SWE‑Pro (56.22%) and Terminal Bench 2 (57.0%), and made the model available on Hugging Face and Ollama. (x.com/i/status/2043139204612592057; x.com/ollama/status/2043139204612592057) The release gives developers a new publicly accessible model to test against code and terminal benchmarks. (x.com/i/status/2043139204612592057)

MiniMax has released the weights for its M2.7 language model, putting a new coding-focused system on Hugging Face and Ollama. (huggingface.co; ollama.com) On MiniMax’s Hugging Face page, the company said M2.7 scored 56.22% on SWE-Pro and 57.0% on Terminal Bench 2. Ollama listed the model as updated about two weeks ago with a 200,000-token context window and cloud access through its library. (huggingface.co; ollama.com) SWE-Pro is a software benchmark built to test long, messy engineering tasks rather than short code snippets. Its public site says the benchmark includes 1,865 problems drawn from 41 repositories, with tasks that can take professional engineers hours or days. (scaleapi.github.io) Terminal Bench 2 measures whether an agent can work inside a command-line environment, where it has to inspect files, run commands, and recover from mistakes. The public leaderboard shows top systems well above 80%, which places MiniMax’s reported 57.0% in the middle of a crowded field rather than at the top of the overall table. (tbench.ai) That distinction matters because MiniMax’s claim is about an open model that developers can download, inspect, and run, not about leading every benchmark against closed systems. On its model card, MiniMax said M2.7 “matches GPT-5.3-Codex” on SWE-Pro and described the release as part of its M2 series for coding, agent workflows, and office tasks. (huggingface.co; ollama.com) The model uses a sparse “mixture of experts” design, which is a way of activating only part of a very large network for each token instead of the whole thing every time. Nvidia’s model card for M2.7 lists 230 billion total parameters, 10 billion active parameters, 256 local experts, and 8 experts activated per token. (build.nvidia.com) MiniMax is also pitching M2.7 as an “agent” model, meaning it is tuned to use tools and coordinate multi-step jobs instead of only answering prompts. The company’s pages say M2.7 supports “Agent Teams,” scored 46.3% on Toolathon, and reached a 97% skill-adherence rate across more than 40 complex skills. (huggingface.co; minimax.io) The release broadens the pool of public models aimed at software work at a moment when most of the highest benchmark scores still come from closed products from Anthropic, OpenAI, and Google. For developers, the immediate test is simpler: whether M2.7’s published scores hold up once more people run it on real repositories, terminals, and internal toolchains. (tbench.ai; huggingface.co)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.