MiniMax M2.7 and GPU squeeze

A 230‑billion‑parameter MiniMax release and surging Nvidia GPU rents together underline growing compute pressure in AI. MiniMax M2.7 — an open model optimised for Nvidia’s stack and claiming big throughput gains on Blackwell hardware — was released to the community this week. Hourly rental prices for Blackwell GPUs have climbed to about $4.08, roughly 48% higher than two months ago, reflecting rising demand for agentic workloads. (developer.nvidia.com) (x.com) (alltoc.com)

Running a frontier AI model is getting more expensive just as developers get a new one to run. MiniMax M2.7 was released on April 11, while hourly rental prices for Nvidia Blackwell graphics processors climbed to $4.08. (developer.nvidia.com) (techmeme.com) MiniMax M2.7 is an open-weights language model with 230 billion total parameters, but only 10 billion are active for each token, a design meant to cut inference cost. Nvidia said the model is now available through its software stack and the wider open-source inference ecosystem. (developer.nvidia.com) That “active parameters” number is the key engineering trick. MiniMax M2.7 uses a mixture-of-experts design, which works like a team where only a few specialists answer each request instead of waking up the whole staff. (developer.nvidia.com) Nvidia’s blog says M2.7 has 256 experts, activates 8 experts per token, runs across 62 layers, and supports a 200,000-token context window. The company said it and the open-source community added optimized kernels to vLLM and SGLang to speed inference on this model family. (developer.nvidia.com) (github.com) The demand side is moving in the opposite direction. The Ornn Compute Price Index, cited Monday by Techmeme and other aggregators, put Blackwell hourly rental at $4.08, up 48% from $2.75 two months earlier. (techmeme.com) (onenewspage.com) Those rentals matter because many companies do not buy enough chips to cover every burst of demand. They rent graphics processors by the hour from cloud providers when they need extra capacity for training, testing, or serving models to users. (developer.nvidia.com) (techmeme.com) The workloads pushing prices up are “agentic” systems, which means software that can call tools, search, write code, and take multi-step actions with limited human prompting. Nvidia has spent the past year publishing tools, benchmarks, and security guidance for that style of software, and MiniMax says M2.7 is built for “complex agent harnesses” and “dynamic tool search.” (developer.nvidia.com) (github.com) MiniMax’s own materials pitch M2.7 as a model for coding and long-running productivity tasks, and its GitHub repository includes deployment guides for Transformers, vLLM, and SGLang. That lowers the barrier to trying the model, but it does not lower the cost of the hardware needed to serve it at scale. (github.com) (platform.minimax.io) The squeeze is showing up in two places at once: model makers are releasing bigger, more specialized open systems, and the market price for the newest Nvidia compute is rising at the same time. For developers, the next test is whether software efficiency gains can outrun a rental curve that is still pointing up. (developer.nvidia.com) (techmeme.com)

MiniMax M2.7 and GPU squeeze

Get your own daily briefing