New Model Releases Roundup

Several model stories landed recently: MiniMax open‑sourced M2.7 with strong SWE‑Pro and Terminal Bench 2 scores, Google published TimesFM for zero‑shot time‑series forecasting, NousResearch released Autoreason for conference‑style paper generation, and Anthropic is restricting distribution of its Mythos cybersecurity model amid hacking concerns. (x.com, x.com, x.com, x.com)

A cluster of model releases over the past two weeks showed how uneven the artificial intelligence race has become: more open weights, more specialized tools, and tighter limits on systems that look useful for hacking. (minimax.io) (research.google) (github.com) (red.anthropic.com) MiniMax said its new M2.7 model scored 56.22% on SWE-Pro and 57.0% on Terminal Bench 2, two tests built to measure software engineering work in messy real environments rather than short coding puzzles. The company described M2.7 as a model that “deeply” participated in its own development by updating memory and building skills for reinforcement learning experiments. (minimax.io) A time-series forecast is a guess about the next points in a sequence, like tomorrow’s power demand after months of hourly readings. Google’s TimesFM is built for that job, and Google says the open model can forecast new datasets without task-specific training because it was pre-trained on billions of time points from many domains. (docs.cloud.google.com) (github.com) Google’s September 23, 2025 research post said TimesFM started as a zero-shot model and then gained a few-shot mode that learns from a handful of examples at inference time. The public repository says the latest open version is TimesFM 2.5, with 200 million parameters, a 16,000-step context window, and an optional quantile head for longer-range forecasts. (research.google) (github.com) Nous Research’s Autoreason targets writing and judging in “subjective domains,” where there is no single numeric right answer and models tend to over-edit themselves. Its public repository says the system keeps three candidates each round — the unchanged draft, a revision, and a synthesis — and lets fresh judge agents rank them with a blind Borda count, while “do nothing” remains an explicit option. (github.com) The repository frames the problem as one of self-refinement failure: models invent flaws, expand the draft every pass, and rarely decide no edit is needed. Nous says Autoreason beat single-pass and other baselines on open-ended writing tasks, and reported 77% private-test accuracy on 150 CodeContests problems with Claude Sonnet 4.6 plus Autoreason, versus 73% for single-pass. (github.com) Anthropic moved in the opposite direction on access. In an April 7 technical post, the company said Claude Mythos Preview was “strikingly capable” at computer security tasks and that it was launching Project Glasswing to use the model for defensive work on critical software. (red.anthropic.com) Anthropic said Mythos Preview could identify and exploit zero-day vulnerabilities in every major operating system and every major web browser during internal testing, and said more than 99% of the vulnerabilities it found had not yet been patched. CNBC reported the company limited rollout of the model while talking with United States agencies including the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation. (red.anthropic.com) (cnbc.com) Taken together, the releases split the market into three tracks. MiniMax is pushing open agentic coding models, Google is turning a research model into a forecasting tool inside BigQuery, and Anthropic is treating one of its strongest cyber systems as something to distribute cautiously. (minimax.io) (docs.cloud.google.com) (red.anthropic.com) The common thread is narrower, more concrete use. Instead of one general chatbot doing everything, the latest releases focus on coding benchmarks, demand forecasts, paper-style drafting loops, and bug hunting that companies now think may need guardrails before wide release. (minimax.io) (github.com 1) (github.com 2) (red.anthropic.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.