NetEase YouDao releases Confucius4

- NetEase YouDao released Confucius4 on May 22, publishing a 27-billion-parameter multimodal model and pairing it with a new multilingual Confucius4-TTS repository. (huggingface.co) - The clearest release detail is the weight dump: 12 safetensors files totaling about 54.7 GB appeared on Hugging Face about one day ago. (huggingface.co) - NetEase YouDao said Confucius4-TTS code and model weights are still being prepared, with an online demo available now. (github.com)

NetEase YouDao has moved its Confucius model line further into open release territory with Confucius4, a 27-billion-parameter multimodal model posted on Hugging Face under an Apache 2.0 license. The model card says Confucius4 is built on the Qwen3.5 architecture and is aimed at advanced mathematical reasoning. (huggingface.co) A separate GitHub repository for Confucius4-TTS appeared within the past two days, framing the speech system as a multilingual, cross-lingual zero-shot text-to-speech engine. (huggingface.co) The release matters less for a single benchmark claim than for what YouDao actually shipped. (github.com) The Hugging Face repository shows full downloadable weights in 12 safetensors shards totaling about 54.7 GB, with the main model files uploaded about one day ago and the README updated about 22 hours ago. The TTS side is earlier-stage: GitHub says the repository was initialized about seven hours before it was crawled, and the README says code and model weights are still being prepared. Here’s the thread version: 1/ NetEase YouDao has released Confucius4 as an open-source multimodal LLM, with the model card describing it as a Qwen3.5-based 27B system focused on mathematical reasoning. (huggingface.co) The license listed on Hugging Face is Apache 2.0. 2/ The most concrete part of the launch is the actual weight release. Hugging Face shows 12 safetensors shards and a total repository size of about 54.7 GB, which means this is not just a paper or demo page. 3/ YouDao is positioning Confucius4 around visual-math performance. (huggingface.co) The model card says it achieves state-of-the-art results among comparable-scale models on several visual math benchmarks, including Math-Hard-500, Math-Figure and MathVision testmini. 4/ Some of those benchmarks need caution. The card says Math-Figure and Math-Hard-500 are proprietary datasets collected or defined by the team, so outside replication will depend on whether those sets are later published. (huggingface.co) 5/ The training pitch is also specific. YouDao says the model uses iterative supervised fine-tuning plus reinforcement learning, adds pure-text reasoning data during SFT, and applies “length-aware” RL to reduce chain-of-thought length. (huggingface.co) The card claims CoT length was reduced by 43.2%. 6/ There is also a China-market angle in the release. (huggingface.co) The model card says targeted optimization on Chinese-language data makes outputs more aligned with the linguistic habits and expression preferences of Chinese-speaking users. 7/ The speech piece is being released alongside it, but not fully shipped yet. (huggingface.co) The Confucius4-TTS GitHub README says code and model weights are “under preparation” and points users to an online demo in the meantime. 8/ On features, the TTS repo makes bigger claims than the model repo. It says the system supports 14 languages, allows unconstrained voice cloning without a reference transcript, and supports cross-lingual voice transfer and emotion transfer. (huggingface.co) 9/ The 14 languages listed are Chinese, English, Japanese, Korean, German, French, Spanish, Indonesian, Italian, Thai, Portuguese, Russian, Malay and Vietnamese. (huggingface.co) 10/ One important nuance: the “3-second” voice-cloning line circulated in social chatter, but the GitHub README text available today does not spell out that exact duration in the surfaced copy. (github.com) What is clearly documented is zero-shot voice transfer, no reference transcript required, and multilingual cross-lingual synthesis. 11/ The broader pattern fits YouDao’s recent open-source behavior. (github.com) Its GitHub organization has also published projects including LobsterAI and earlier speech work such as EmotiVoice, suggesting this is part of a wider push rather than a one-off upload. 12/ What to watch next: whether YouDao publishes the remaining Confucius4-TTS weights, fuller benchmark documentation, and reproducible evaluation details beyond the current repo materials and demo. The TTS README says those assets are still coming. (github.com 1) (github.com 2)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.