NVIDIA offers MiniMax M2.7 API free
- NVIDIA’s Build platform now exposes free hosted inference endpoints for third-party models including MiniMax M2.7, Z.ai’s GLM-5.1, and DeepSeek’s latest reasoning lineup. - MiniMax M2.7 is listed as a free endpoint with 230B parameters, while GLM-5.1’s NVIDIA model card shows 744B total parameters and 205K context. - This matters because NVIDIA is moving beyond downloadable NIM containers into browser-testable APIs, making its stack easier to try before renting GPUs.
NVIDIA is turning its AI stack into something much easier to sample. Not just download, not just self-host, but actually try in the browser with a live endpoint. That is the real shift here. On Build — NVIDIA’s model hub and API playground — the company is now surfacing free hosted inference for a wider set of third-party models, including MiniMax M2.7, GLM-5.1, and DeepSeek models, all wrapped in NVIDIA’s NIM runtime. (build.nvidia.com) ### What actually changed? The new thing is not that NVIDIA supports these models in some abstract sense. It is that developers can hit them as hosted trial endpoints on Build right now. The homepage now pushes “free inference with leading models,” and the model pages for names like MiniMax M2.7 and GLM-5.1 are presented as runnable API experiences, with terms tied to NVIDIA’s API trial service. That makes the first step much smaller — open(build.nvidia.com)ide whether the model is worth deeper integration. (build.nvidia.com) ### Why is MiniMax M2.7 the headline model? Because it is exactly the kind of model NVIDIA wants to showcase right now — big, agent-friendly, and useful for coding work. NVIDIA’s MiniMax page labels M2.7 a 230B-parameter text model for coding, reasoning, and office tasks, and the NGC listing describes it as a sparse MoE model tuned for software engineering, tool use, search, and document workflows. NVIDIA’s own technical blog framed M2.7 as(build.nvidia.com) instruction following, better environment interaction, and native support for agent teams and dynamic tool search. (build.nvidia.com) ### What else is in the free tier? It is not just one splashy model. Build’s catalog now highlights GLM-5.1 from Z.ai and DeepSeek reasoning models alongside NVIDIA’s own Nemotron family and Google’s Gemma line. GLM-5.1’s model card is especially telling — 744B total parameters, 40B active, 205K-token context, tool calling, browsing, terminal operations, and multi-step agent workflows. In other words, NVIDIA is not usi(build.nvidia.com)n front of developers. (build.nvidia.com) ### Why does NVIDIA want this? Because hosted APIs solve the hardest adoption problem — friction before commitment. NVIDIA has spent the last two years selling NIM as the bridge between frontier models and NVIDIA hardware, first through downloadable microservices for Developer Program members and enterprise deployments, then through self-hosted containers across cloud and on-prem setups. Free hosted endpoints add the missing top of funnel. (build.nvidia.com)move to paid infrastructure later if the workflow sticks. (developer.nvidia.com) ### Is this the same as open model hosting? Not quite. The catch is that these are trial endpoints governed by NVIDIA’s API terms, even when the underlying model is third-party and commercially usable under separate licenses. So the free layer is best understood as a proving ground, not a forever-free production service. NVIDIA is basically saying: try the model here, on our optimized stack, then graduate to self-hosting or larger deployment once you know what you want. (build.nvidia.com) ### Why mention Cursor-style workflows? Because the real buyer is not a benchmark chaser. It is the developer wiring an agent into an editor, terminal, browser, or internal tool. NVIDIA’s docs already emphasize standard APIs and quick integration into existing frameworks and workflows, and its newer Build material leans hard into coding agents, blueprints, and local-to-remote development flows. The point is convenience(build.nvidia.com)side NVIDIA’s ecosystem. (developer.nvidia.com) ### So what is the bottom line? NVIDIA is making the front door wider. Free inference for models like MiniMax M2.7 is not just generosity — it is distribution. If developers start on NVIDIA-hosted APIs, they are much more likely to end up on NVIDIA runtimes, NVIDIA containers, and eventually NVIDIA GPUs. (build.nvidia.com)