Real-time video models cut latency

New research demos show dramatic reductions in AV-generation latency: StreamDiffusionV2 hits interactive generation at 16 FPS with <0.5s initial latency on two RTX 4090s demo, OmniForcing claims a 35x TTFC cut to ~0.7s on one GPU paper/demo, and LiveTalk touts coherence wins at 20x less compute demo. These model-level gains open new possibilities for low-latency AR and live synthetic overlays in broadcasts.

StreamDiffusionV2’s team lists authors from UT Austin, UC Berkeley, Stanford, MIT and other institutions, and the project page and paper list Chenfeng Xu and Maneesh Agrawala among leads streamdiffusionv2.github.io. StreamDiffusionV2 documents an SLO-aware batching scheduler, a block scheduler, a sink-token–guided rolling KV cache, and a motion-aware noise controller as the core system optimizations that enable parallelized denoising across layers and steps arxiv.org. StreamDiffusionV2 published code and model checkpoints on GitHub and Hugging Face (model release noted Oct 6, 2025) and the repo records an MLSys acceptance announcement on 2026-01-26. github.com. OmniForcing’s arXiv paper frames itself as the first framework to distill an offline dual-stream bidirectional audio-visual diffusion teacher into a streaming autoregressive generator and introduces a Joint Self‑Forcing Distillation method to stabilize training across modalities arxiv.org. OmniForcing’s paper compares the trained teacher (LTX‑2) and reports that the teacher required roughly 197 seconds for offline sequence generation in their experiments, highlighting the offline→streaming gap the distillation targets arxiv.org. LiveTalk’s repository and model card describe an on‑policy distillation pipeline for autoregressive video avatars, publish a 1.3B‑parameter release, and report a measured throughput/first‑frame latency point on their 1.3B baseline in the model README. github.com. LiveTalk evaluates on HDTF, AVSpeech and CelebV‑HQ benchmarks and reports matching full‑step bidirectional baseline visual quality while lowering inference cost in their multi‑turn interaction benchmark comparisons with models labeled Sora2 and Veo3 in the paper. papers.cool.

Real-time video models cut latency

Get your own daily briefing