Decentralized 72B model
Bittensor reported it trained a 72‑billion‑parameter model using more than 70 independent contributors over home internet, processing about 1.1 trillion tokens and posting a 67.1 MMLU zero‑shot score. The project says all model weights and code are open source and that this is the largest decentralized LLM pre‑training run to date. (x.com)
Bittensor’s Templar project says it trained a 72 billion parameter language model over ordinary internet links instead of a single data center. (arxiv.org) A language model is the software behind chatbots, and “72 billion parameters” means 72 billion adjustable values the system learns during training. The team reported that Covenant-72B processed about 1.1 trillion tokens, the chunks of text models learn from, and finished the run on March 10, 2026. (arxiv.org) The paper says the run used open, permissionless participation on Bittensor Subnet 3, called Templar, with peers joining and leaving during training. Templar’s repository says the completed run is now dormant and that Subnet 3 has since shifted to a different mechanism called Crusades. (arxiv.org) (github.com) Most frontier model training happens inside tightly managed clusters run by one company, because moving updates between machines is slow and failures can break the job. The Covenant paper says it used a communication method called SparseLoCo and an incentive system called Gauntlet to keep training going across untrusted peers on commodity internet. (arxiv.org) That setup is the point of the experiment: not just training a big model, but testing whether open networks can coordinate expensive artificial intelligence work without a central operator. Bittensor’s documentation describes the broader network as an open source marketplace for digital commodities including artificial intelligence training and inference. (docs.learnbittensor.org) The group reported a 67.1 score on zero-shot Massive Multitask Language Understanding, a benchmark that tests general knowledge without task-specific examples. Crypto and industry outlets compared that result with Meta’s Llama 2 70B and LLM360 K2, though those comparisons depend on matching evaluation settings and are not a substitute for broader testing. (arxiv.org) (blockonomi.com) The weights and code are public. Hugging Face hosts Covenant-72B and its chat-tuned variant, and the model card says the release is a base model rather than a consumer chatbot. (huggingface.co 1) (huggingface.co 2) The claim of “largest decentralized pre-training run” rests on scale and openness, not on beating the biggest closed models from OpenAI, Anthropic, Google, or Meta. The paper frames Covenant-72B as evidence that globally distributed training can produce a competitive large model while letting outside contributors participate directly. (arxiv.org) That claim is now colliding with governance questions inside the same ecosystem. In the past week, Covenant AI said it was leaving Bittensor and accused the network of centralization, while Bittensor’s token fell sharply amid the dispute. (decrypt.co) (cryptobriefing.com) So the model stands as a technical proof that dozens of independent machines can train one large system over the public internet. The harder next test is whether the network around that system can stay as decentralized as the training run itself. (arxiv.org) (decrypt.co)