MiniMax M2.7 runs locally
Unsloth announced that MiniMax M2.7, a 230‑billion‑parameter open model, is now runnable locally with a dynamic 4‑bit MoE variant that can operate on high‑RAM machines and is available in GGUF format on Hugging Face. The post includes benchmark claims that it tops several coding/terminal leaderboards and points to resources for experimenting offline. (x.com)
A large language model is software that predicts the next token, or chunk of text, from patterns in training data. MiniMax M2.7 is the latest open model in that category, and Unsloth said on April 12 it can now run locally in a compressed format on high-memory consumer machines. (unsloth.ai) MiniMax lists M2.7 as a 230 billion-parameter mixture-of-experts model with 10 billion active parameters and a 200,000-token context window. Mixture-of-experts means the model routes each request through a smaller set of specialist parts instead of using all 230 billion parameters at once. (unsloth.ai) (huggingface.co) Running a model locally usually means shrinking it with quantization, which stores weights in fewer bits, like saving a photo at lower file size. Unsloth said its dynamic 4-bit GGUF version cuts the unquantized bfloat16 footprint from 457 gigabytes to 108 gigabytes, enough for a 128 gigabyte unified-memory Mac or a system with 96 gigabytes of random-access memory plus a 16 gigabyte graphics card. (unsloth.ai) GGUF is a file format used by llama.cpp, a popular local-inference engine for running models on central processors, graphics processors, or both with offloading. Unsloth said M2.7 now works in llama.cpp and in Unsloth Studio on macOS, Windows, Linux, and Windows Subsystem for Linux. (unsloth.ai) The timing matters because open-model releases have been outpacing the hardware most people actually own. A 108 gigabyte local build does not make M2.7 lightweight, but it moves a 230 billion-parameter class model from server-only territory toward workstations and high-end desktops. (unsloth.ai) (huggingface.co) MiniMax is pitching M2.7 for coding, tool use, and office work rather than general chat alone. In the model card and GitHub repository, the company said the model scored 56.22 percent on SWE-Pro, 57.0 percent on Terminal Bench 2, 76.5 on SWE Multilingual, and 52.7 on Multi SWE Bench. (github.com) (huggingface.co) Those benchmark numbers are vendor-reported, not independently audited in the release materials Unsloth links to. MiniMax also said M2.7 reached a 1495 Elo score on GDPval-AA, 46.3 percent on Toolathon, and 62.7 percent on MM Claw, while Ollama’s cloud model page repeats the same figures. (github.com) (ollama.com) MiniMax describes M2.7 as a successor to MiniMax-M2 and says the model supports “Agent Teams,” or multiple coordinated model roles inside one workflow. The company also said an internal version improved a programming scaffold over more than 100 rounds and raised performance by 30 percent, another claim that has not been independently verified in the public materials. (github.com) (huggingface.co) For people who want to test it offline, the practical constraints are blunt: memory first, then speed. Unsloth said the 4-bit build can deliver about 15 or more tokens per second on a 128 gigabyte Mac and more than 25 tokens per second on a setup with 96 gigabytes of system memory and one 16 gigabyte graphics card, while larger 8-bit files need about 243 gigabytes of memory. (unsloth.ai) The short version is that M2.7 is still a very large model, but it is no longer confined to remote application programming interfaces and data-center hardware. As of April 12, the model files, local-run guide, and GGUF builds were all publicly posted, making the main barrier hardware cost rather than access. (huggingface.co) (unsloth.ai)