Real‑time talking characters

A new text‑to‑video component called LPM‑10 is being discussed for producing real‑time talking characters, and creators are comparing it alongside Runway Gen‑4, Descript, and Google Veo 3 in short demos (x.com) (x.com) (x.com). The conversation is focusing on consistency and real‑time performance rather than single high‑quality renders (x.com) (x.com).

A talking character system is getting attention this week because it aims to keep one on-screen person responsive in real time, not just render a polished clip. (lpm-ai.org) (arxiv.org) The system is called LPM 1.0, short for Large Performance Model, and its paper was posted to arXiv on April 9, 2026. The authors say it takes a character image plus audio and text cues, then generates speaking and listening behavior at real-time speed. (arxiv.org) In plain terms, the problem is not making a single lip-synced shot. The harder task is keeping the same face, timing, gaze, and expression steady over a live exchange that runs longer than a few seconds. (arxiv.org) The LPM team describes that tradeoff as a “performance trilemma”: expressiveness, low latency, and long-horizon identity stability. Its project page says the model is a 17 billion parameter Diffusion Transformer and claims identity consistency for more than 10 minutes of continuous generation. (arxiv.org) (lpm-ai.org) That puts it in a different lane from several tools creators are using as comparison points. Runway says Gen-4 is built for consistent characters, objects, and locations across scenes, while Google says Veo 3.1 is built for high-fidelity video with native audio and up to 4K output. (runwayml.com) (deepmind.google) (aistudio.google.com) Descript sits closer to the avatar end of the market, but with a production workflow focus. Its site says users can create talking avatars by typing a script, choose gallery or custom avatars, and generate videos in more than 20 languages. (descript.com) The LPM paper says the model was trained for “single-person full-duplex audio-visual conversational performance.” That means one character is meant to speak when it is talking and visibly listen when a user is talking, instead of idling in a loop between lines. (arxiv.org) The paper also says the research team built a benchmark called LPM-Bench to measure interactive character performance. The authors report state-of-the-art results across their tested dimensions, but those results come from the team’s own benchmark and paper rather than an outside evaluation. (arxiv.org) Runway and Google are both framing consistency as a core feature too, but in broader filmmaking terms. Runway emphasizes scene-to-scene continuity and camera coverage, while Google emphasizes realism, prompt adherence, and synchronized audio generation. (runwayml.com) (deepmind.google) (storage.googleapis.com) So the current comparison is less about which model makes the prettiest single shot and more about which one can hold a believable character together under live use. The next test is whether creators keep reaching for these systems after the demo clips end. (arxiv.org) (runwayml.com) (descript.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.