NVIDIA releases MiniMax M2.7
NVIDIA announced MiniMax M2.7, an update to the MiniMax family pitched specifically to support agentic workflows—multi‑step chains that combine reasoning, tools and orchestration on NVIDIA platforms. The company frames the model as targeting complex applications like research workflows and software tasks and positioned the release around scalable agentic deployment (developer.nvidia.com).
NVIDIA has added MiniMax M2.7 to its AI platform, pitching the model for long, multi-step software, research, and office workflows. (developer.nvidia.com) MiniMax M2.7 went live on NVIDIA on April 11, 2026 through NVIDIA NIM and related tooling, with NVIDIA describing it as an update to MiniMax M2.5 for “agentic” jobs that chain reasoning, tool use, and orchestration. (build.nvidia.com) An agentic workflow is a task that unfolds over many steps instead of one prompt, like tracing a software bug, searching documents, calling tools, and writing the fix in sequence. NVIDIA says M2.7 is aimed at those longer jobs rather than simple chat. (developer.nvidia.com) The model is large but selective. NVIDIA lists 230 billion total parameters, 10 billion active per token, 256 experts, and a context window of 204,800 tokens, which is the text span the model can keep in view at once. (build.nvidia.com) That design is called a sparse mixture-of-experts model, which works like a team where only a few specialists are called into each decision instead of the whole staff. NVIDIA says M2.7 activates 8 experts per token to keep inference costs lower than a dense model of the same overall size. (developer.nvidia.com) NVIDIA is not presenting M2.7 as its own foundation model. The company’s model card says MiniMax M2.7 is a third-party model “not owned or developed by NVIDIA,” and that NVIDIA’s role is to host, optimize, and package it for deployment on its GPU stack. (build.nvidia.com) The deployment pitch is as important as the model pitch. NVIDIA says NIM packages models as self-hosted inference microservices with standard application programming interfaces, so developers can run them on Blackwell and Hopper graphics processing units, in data centers, or in the cloud. (developer.nvidia.com; build.nvidia.com) NVIDIA also says it worked with the open-source community to add MiniMax M2 optimizations to vLLM and SGLang, including kernels for query-key normalization and floating-point 8 mixture-of-experts inference. Those are low-level speedups meant to raise throughput when companies run many agent requests at once. (developer.nvidia.com) MiniMax’s own repository frames M2.7 as a model that can build “Agent Teams,” search for tools dynamically, and improve parts of its workflow during training. The same repository lists benchmark claims including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and a 66.6% medal rate on MLE Bench Lite. (github.com) NVIDIA’s release fits a broader shift in the artificial intelligence market from single-turn chatbots to systems that can run longer jobs under tighter operational controls. The company’s software stack now pairs third-party models like M2.7 with containers, security scanning, and OpenAI-compatible interfaces meant for production rollouts, not just demos. (catalog.ngc.nvidia.com; developer.nvidia.com) The immediate test is whether developers treat M2.7 as more than another model endpoint. NVIDIA is betting that packaging, optimization, and infrastructure for long-running agents will matter as much as the model itself. (developer.nvidia.com)