China’s latest AI models surface

Chinese AI groups are publicising a new wave of models and features, including GLM-5.1 (positioned as an advanced 8‑hour autonomous model), PixVerse C1 for AI video production, Coze 2.5 with Agent World capabilities, and VoxCPM 2 for regional dialect speech — signalling focused investment across autonomy, media and localisation. The updates show China’s ecosystem is diversifying beyond text LLMs into video, agents and language variants tailored to local markets. For buyers and competitors, that means more specialised options are emerging outside the Western model landscape. ( )

China’s artificial intelligence scene just pushed out four very different products in the same week, and none of them are trying to be “just another chatbot.” Z.ai unveiled GLM-5.1 on April 7, PixVerse released C1 on April 8, Coze rolled out version 2.5 with “Agent World” on April 7, and VoxCPM 2 updates landed publicly this week through GitHub and Python package releases. (z.ai, docs.platform.pixverse.ai, kucoin.com, pypi.org) The split is the story. One company is chasing long-running software agents, one is chasing film-style video, one is building a home for autonomous assistants, and one is tuning speech for multilingual and regional voice output instead of generic Mandarin-only text systems. (z.ai, docs.platform.pixverse.ai, pandaily.com, github.com) GLM-5.1 is the clearest example of the shift. Z.ai says the model is built for “long-horizon tasks,” and outside coverage says it is meant to work on one job for up to eight hours, which is closer to handing an intern a ticket queue than asking a chatbot a single question. (z.ai, venturebeat.com) Z.ai is pitching that claim with software benchmarks, not poetry demos. Its post says GLM-5.1 leads GLM-5 on repository generation and terminal-task tests and reaches state-of-the-art on SWE-Bench Pro, which is a benchmark for fixing real software issues pulled from GitHub projects. (z.ai) PixVerse C1 is aimed at a different buyer entirely. PixVerse’s documentation says C1 produces 1080p video, adds native audio, supports storyboard-to-video conversion, and is tuned for physically accurate motion and film-style camera control rather than plain text-to-video clips. (docs.platform.pixverse.ai, pixversec1.com) That points at a market China already knows well: short-form video and serialized mobile entertainment. Coverage around the launch describes C1 as a model for short dramas, anime, trailers, and ad-style content, which fits a domestic internet economy built around fast-turn creative production rather than Hollywood-length releases. (newsglobenow.com, blockchain.news) Coze 2.5 moves one layer up from models to infrastructure. Reports on the April 7 release say each agent gets a cloud computer, a cloud phone, and even an email identity, so the product is trying to give an artificial intelligence agent a desk, a laptop, and an inbox instead of leaving it trapped in a chat window. (kucoin.com, houdao.com) That matters because an agent that can remember, message, and use tools starts to look less like search and more like software labor. Pandaily’s write-up says Coze 2.5 adds persistent memory and workflow automation, while other reports describe a multi-agent environment where agents can collaborate and keep separate identities over time. (pandaily.com, toolmesh.ai) VoxCPM 2 shows the same specialization in speech. The project’s GitHub and Python package pages describe a 2 billion parameter text-to-speech system trained on more than 2 million hours of multilingual audio, with voice cloning and expressive speech generation built in. (github.com, pypi.org) Speech is where localization becomes concrete. VoxCPM’s materials emphasize multilingual output and natural prosody, and the launch chatter around it has focused on regional language and dialect use cases, which is exactly the kind of product edge that matters in a market with big differences in accent, language mix, and local media styles. (github.com, arxiv.org) Put together, these launches make China’s artificial intelligence market look less like a race to copy one American chatbot and more like a stack of tools for specific jobs. The new releases are spread across coding agents, video production, agent operating systems, and localized speech, which means buyers now have more specialized options and rivals have more fronts to watch than simple text models alone. (z.ai, docs.platform.pixverse.ai, pandaily.com, github.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.