MiniMax M2.7 Debuts

Fireworks AI made MiniMax_AI’s M2.7 available Day‑0 on its inference cloud, advertising strengths in software engineering, autonomous agents, log analysis and office automation. The announcement highlighted 200,000+ token context and native multi‑agent support for the model. (x.com)

Fireworks AI has put MiniMax’s M2.7 on its inference cloud, giving developers immediate access to the new model without hosting the weights themselves. (fireworks.ai) In plain terms, an inference cloud is the rented engine that runs a model after it is trained. Fireworks says M2.7 is available in both serverless mode and dedicated deployments, with serverless pricing listed at $0.30 per 1 million input tokens, $0.06 per 1 million cached input tokens, and $1.20 per 1 million output tokens. (fireworks.ai) (fireworksai-docs.mintlify.dev) MiniMax released M2.7 on March 18, 2026, and described it as a model built for software engineering, office work, and more complex “agent” tasks that use tools and multi-step plans. MiniMax’s own model page says it improved log analysis, bug hunting, code security, machine learning tasks, and editing in Word, Excel, and PowerPoint. (minimax.io 1) (minimax.io 2) The “200,000-token context” pitch is about memory. NVIDIA’s technical write-up lists M2.7 at a 200,000-token input context, while Fireworks lists 196.6 thousand tokens on its hosted version, which is roughly enough room for a very large codebase, long logs, or many documents in one prompt. (developer.nvidia.com) (fireworks.ai) The “multi-agent” claim is about dividing work among specialized helpers instead of asking one model to do everything at once. MiniMax says M2.7 supports native “Agent Teams” with stable role identity and autonomous decision-making, and Fireworks repeats that positioning in its hosted model listing. (github.com) (fireworks.ai) Under the hood, M2.7 is a mixture-of-experts model, which works like a switchboard that activates only some of its sub-models for each token instead of the full network every time. NVIDIA says the model has about 230 billion total parameters but only 10 billion active per token, with 256 experts and 8 experts activated per token. (developer.nvidia.com) MiniMax is using benchmark scores to argue that M2.7 is not just cheaper to run, but competitive on coding and workplace tasks. The company says M2.7 scored 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, 55.6% on VIBE-Pro, and 1495 ELO on GDPval-AA, which it described as the highest among open-weight models on that office benchmark. (minimax.io) (github.com) MiniMax is also framing M2.7 as part of its own development loop rather than only a product for users. Its GitHub page says an internal version of the model optimized a programming scaffold over more than 100 rounds and delivered a 30% performance improvement in that setup. (github.com) The release is not friction-free for companies that want to self-host. The current license posted on Hugging Face and GitHub says commercial use requires prior written authorization from MiniMax, even though the model’s rollout was presented in many places as an open-weights release. (huggingface.co) (github.com) That makes Fireworks’ day-one hosting more than a convenience play. It gives developers a fast way to try M2.7’s long-context and agent features now, while the harder questions about direct commercial use of the weights sit with MiniMax’s license terms. (fireworks.ai) (huggingface.co)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.