MiniMax‑M2.7 goes local
Unsloth AI announced MiniMax‑M2.7, a 230B‑parameter open model that the team says tops recent benchmarks and can run locally on 128GB systems using Dynamic 4‑bit MoE. (x.com) The post highlights benchmark wins on SWE‑Pro and Terminal Bench 2 and includes GGUF deployment guidance for local setups. (x.com)
Large language models are prediction engines trained on huge text and code corpora, and the current race is to make them useful for long, multi-step software work on local machines. Unsloth said on April 12 that MiniMax‑M2.7, a 230 billion-parameter open model, can be run locally on a 128 gigabyte system using a compressed Dynamic 4-bit format. (unsloth.ai) MiniMax‑M2.7 uses a mixture-of-experts design, which is like routing each question to a smaller specialist team instead of waking up the whole model every time. Unsloth’s documentation says the model has 230 billion total parameters but only 10 billion active per inference step. (unsloth.ai) That smaller active slice is what makes local deployment plausible. Unsloth says the full bfloat16 version needs 457 gigabytes, while its Dynamic 4-bit GGUF file cuts that to 108 gigabytes, enough for a 128 gigabyte unified-memory Mac or a setup with a 16 gigabyte graphics card and 96 gigabytes of system memory. (unsloth.ai) GGUF is a file format used by local inference tools such as llama.cpp, and quantization is the trick of storing model weights with fewer bits, like zipping a file so it takes less space. Unsloth says its 4-bit build is meant to preserve key layers at higher precision while still fitting on consumer-adjacent hardware. (unsloth.ai) The pitch is not just that the model is big, but that it is tuned for “agentic” work, meaning software that can plan, call tools, edit files, and keep going across many steps. MiniMax’s GitHub page says M2.7 supports “Agent Teams,” dynamic tool search, and a 200,000-token context window for long sessions. (github.com) (unsloth.ai) MiniMax and Unsloth are also framing the release around coding benchmarks. Unsloth says M2.7 scored 56.22% on SWE‑Pro and 57.0% on Terminal Bench 2, while MiniMax’s repository says the SWE‑Pro result matches GPT‑5.3‑Codex. (unsloth.ai) (github.com) Those numbers need context because the public leaderboards shown today do not line up one-for-one with the release claims. The SWE‑Bench Pro page lists top public results in the low-40% range under its own scaffold, and the Terminal Bench 2 leaderboard shows frontier agent systems above 80%, with Claude Code at 58.0% and several other agents clustered near M2.7’s reported 57.0%. (scaleapi.github.io) (tbench.ai) That does not mean the claims are false; it means benchmark setup matters. SWE‑Bench Pro says runs use up to 250 turns and an uncapped cost, and Terminal Bench 2 ranks full agent systems rather than raw base models, so scores can move a lot depending on scaffolding, tools, and evaluation rules. (scaleapi.github.io) (tbench.ai) MiniMax is presenting M2.7 as the successor to MiniMax‑M2.5 and as a model that can improve its own training workflow. Its GitHub page says an internal version optimized a programming scaffold over more than 100 rounds and improved performance by 30% in that loop. (github.com) For developers, the immediate change is practical: a model that would normally sit in data centers can now be tested in llama.cpp or Unsloth Studio on a single high-memory machine. Unsloth says users should avoid CUDA 13.2 because it may cause poor outputs, and it published command guidance for local deployment on release day. (unsloth.ai) The release lands as open-model vendors keep pushing larger systems into smaller boxes with quantization and sparse routing. MiniMax‑M2.7’s real test will be whether developers can reproduce its coding results outside launch-day demos and benchmark sheets. (huggingface.co) (github.com)