Tencent previews Hy3 MoE
- Tencent’s Hunyuan team open-sourced Hy3 preview on April 23, releasing a 295 billion-parameter mixture-of-experts model with 21 billion active parameters, a 256,000-token context window, and weights on GitHub and Hugging Face. - Tencent says Hy3 was trained on rebuilt pretraining and reinforcement-learning infrastructure, uses 192 experts with top-8 activation, and posted its biggest gains in coding and agent benchmarks including SWE-bench Verified. - The release puts Tencent into the crowded open large-model race alongside DeepSeek, Kimi and GLM, with Tencent positioning Hy3 around product use and cost efficiency. (github.com)
A mixture-of-experts model is like a company that routes each problem to a few specialists instead of making the whole staff answer every question. Tencent says its new Hy3 preview uses that design and open-sourced it on April 23. (github.com) (huggingface.co) Tencent’s Hunyuan team describes Hy3 preview as a 295 billion-parameter model with 21 billion active parameters, plus a 3.8 billion-parameter multi-token prediction layer. It supports a 256,000-token context window. (github.com) (huggingface.co) The model card says Hy3 has 192 experts and activates eight of them for each step, a setup meant to keep compute lower than running all 295 billion parameters at once. Tencent also lists 80 layers, grouped-query attention, and BF16 precision support. (github.com) Tencent says Hy3 is the first model trained on infrastructure it rebuilt this year for pretraining and reinforcement learning. The company calls it the strongest model it has shipped so far. (github.com) (cloud.tencent.com) Before the release itself, Tencent frames the problem as practical deployment: long documents, messy instructions, coding tasks, and software agents that need to use tools. Its model card says the biggest gains came in coding and agent work rather than in headline multiple-choice tests alone. (github.com) On the benchmarks Tencent highlighted, Hy3 scored 74.4 on SWE-bench Verified and 54.4 on Terminal-Bench 2.0, according to third-party coverage summarizing the release materials. The same reports said Hy2 scored 53.0 and 23.2 on those tests. (gncrypto.news) (edgen.tech) Tencent also says Hy3 performed well on reasoning-heavy tests including the Tsinghua Qiuzhen College math PhD qualifying exam for spring 2026 and the China High School Biology Olympiad 2025. Those are unusual choices for a launch card because they lean on recent exams and domain tasks instead of only standard public leaderboards. (github.com) The release matters because China’s open-model field is now crowded with large systems from DeepSeek, Moonshot’s Kimi line, and Zhipu’s GLM family. Tencent’s pitch is narrower: a model aimed at code, agents, long context, and product deployment rather than only raw parameter count. (github.com) (huggingface.co) Hy3 preview is available through GitHub and Hugging Face, and Tencent-linked materials say it is also being offered through Tencent Cloud. That makes this less of a teaser than a live public release with weights and deployment instructions attached. (github.com) (huggingface.co) (cloud.tencent.com) For now, the key fact is simple: Tencent did not just describe a new model on social media. It published Hy3 preview with weights, specs, and benchmark claims, and tied the launch to a rebuilt training stack it says will power its next models. (github.com) (huggingface.co)