Alibaba’s stealth video model tops benchmarks

A previously unannounced video‑generation model from Alibaba debuted atop global benchmarks, surprising competitors in China’s AI space. The result intensifies the race in generative video and suggests rapid progress in foundation models for multimedia. (x.com).

A video model called HappyHorse-1.0 showed up on the Artificial Analysis leaderboard this week with no company name attached, then jumped straight to No. 1 in text-to-video with an Elo score of 1,355 after 5,636 samples. On April 10, Alibaba confirmed the mystery model was its own project and said it came from the company’s ATH AI Innovation Unit. (artificialanalysis.ai) (cnbc.com) That surprise mattered because the leaderboard is a blind test, which means people rate clips without seeing which company made them. HappyHorse-1.0 also led Artificial Analysis in image-to-video, so it was not a one-off win in a single category. (artificialanalysis.ai) (cnbc.com) Text-to-video is the simplest version of the field: you type a sentence like “a red car drifting through rain at night,” and the model tries to turn that sentence into a short moving scene. Image-to-video is one step more controlled, because the model starts from a still picture and has to animate it without changing the character, object, or camera style too much. (github.com) (vchitect.github.io) The hard part is not making one pretty frame. The hard part is making 100 frames in a row where the face stays the same, the hands keep the right shape, and the motion does not flicker like a flipbook with pages missing. (vchitect.github.io) That is why the industry leans on benchmarks with lots of separate checks instead of one beauty score. VBench, a benchmark introduced in a Computer Vision and Pattern Recognition 2024 highlight paper, breaks video quality into 16 dimensions, including motion smoothness, temporal flickering, and spatial relationships, and says its scores were validated against human preference annotations. (vchitect.github.io) VBench also labels how a result was produced, which is a quiet but important detail in a leaderboard fight. A gold label means the VBench team sampled and evaluated the model itself, while a bronze label means the model’s own team supplied the samples and VBench only ran the evaluation. (huggingface.co) Alibaba was not starting from zero here. In April 2025, Alibaba said its Wan2.1 video family topped VBench, supported text effects in both Chinese and English, and had already drawn more than 2.2 million downloads on Hugging Face and ModelScope. (alibabagroup.com) So the real shock was not that Alibaba had video research. The shock was that a previously unnamed model beat a crowded field that included ByteDance’s Dreamina Seedance 2.0 at 1,273 Elo, Google’s Veo 3 at 1,221, Runway Gen-4.5 at 1,225, and OpenAI’s Sora 2 Pro at 1,196 on the same text-to-video board. (artificialanalysis.ai) The timing makes the reveal sharper. CNBC reported that OpenAI had discontinued the standalone Sora app and that ByteDance had paused rollout of Seedance 2.0 amid copyright disputes, which left a rare opening in a market where the leaders had looked more settled a few months earlier. (cnbc.com) Alibaba’s own pitch has been to build one foundation layer and push it into everything else it owns. Chief Executive Officer Eddie Wu has made artificial intelligence the company’s top priority, and Alibaba already has cloud infrastructure, chip work, advertising systems, shopping apps, and entertainment products where a strong video model can be plugged in fast. (cnbc.com) That is why this was bigger than one leaderboard screenshot. A stealth model from Alibaba just told the market that the race for generative video is no longer a side contest behind chatbots, and that one of the strongest challengers may be the company that stayed quiet until it was already winning. (cnbc.com)

Alibaba’s stealth video model tops benchmarks

Get your own daily briefing