Agent Built SN97

An autonomous agent reportedly built a Bittensor subnet named SN97 and published a self‑written paper claiming it beat Qwen's official 4B model on six of eight benchmarks. (x.com) The post said the agent optimized via KL techniques and reported improvements of roughly +13% in reasoning and +8% in math on selected tests. (x.com)

Bittensor runs dozens of on-chain contests for artificial intelligence models, and one of them, Subnet 97, pays whoever best compresses a larger Qwen model into a smaller one. In early April 2026, the subnet’s operator published a paper saying the current top model beat Qwen’s official 4 billion-parameter base model on six of seven public tests. (github.com, github.com) The setup is simple: miners upload compact models to Hugging Face, validators compare those models against the teacher model Qwen3.5-35B-A3B, and the model with the lowest Kullback-Leibler divergence, or closest token-by-token match, gets all of the subnet’s emissions. The repository for Subnet 97 says student models must stay at or below 5.25 billion parameters and beat the current leader by more than 1 percent to take the crown. (github.com, distil.arbos.life) That scoring rule matters because Kullback-Leibler divergence measures how closely one probability distribution matches another, not whether a model is better at math, reasoning, or instruction following. The March 31, 2026 paper frames its central question that way and then compares the subnet’s “king” model against the official Qwen3.5-4B base model on seven benchmarks. (github.com) The paper says the SN97 leader posted a Kullback-Leibler score of 0.049 versus 0.149 for the Qwen baseline and won six of seven tests. It reports gains of about 10 points on GSM8K, a grade-school math benchmark, and 4 points on IFEval, an instruction-following benchmark, with one loss on MMLU-Pro, a factual-knowledge test. (github.com) Subnet 97’s own dashboard describes the contest as “winner takes all,” with the lowest-divergence model receiving 100 percent of emissions. The site also says evaluations use fresh prompts sampled from FineWeb and score predictions across the teacher’s full vocabulary, which is meant to make prompt memorization harder. (distil.arbos.life) The broader Bittensor network uses validators to score miners and route token emissions based on those scores. Bittensor’s mining documentation says validator scores determine each miner’s share of subnet emissions, while the chain’s subnet architecture routes those weights through Yuma Consensus. (docs.learnbittensor.org, bittensor.com) The “agent built SN97” claim is harder to verify than the benchmark paper itself. The public evidence surfaced in a post on X from the account const_reborn, while the code repository shows active development, benchmark scripts, and a paper credited to “Distil SN97 Research,” but the materials visible in public search results do not independently document how much of the subnet’s design or paper-writing was done autonomously. (x.com, github.com) The model family at the center of the contest is also moving quickly. Qwen’s official site says Qwen3.5 launched on February 15, 2026, starting with a 397 billion-parameter multimodal model, while Subnet 97 is focused on distilling Qwen3.5-35B-A3B into much smaller systems that can fit under the subnet’s 5.25 billion-parameter cap. (qwen.ai, github.com) For now, the clearest verified fact is narrower than the social-media framing: a live Bittensor subnet is running a public distillation contest, and its maintainers have published a benchmark claiming the current leader outperformed Qwen’s 4 billion-parameter base model on most of the tests they chose. The stronger claim, that an autonomous agent built the subnet and authored the research end to end, still rests mainly on the operator’s own account and project materials. (distil.arbos.life, github.com, x.com)

Agent Built SN97

Get your own daily briefing