D‑ID launches real-time visual agents
D‑ID announced V4 Expressive Visual Agents—LLM-connected, real-time agents designed for enterprise scale—showing visual, multimodal agent capability for customer-facing and regulated domains announced. The product underscores how agent orchestration must now handle multimodal streams, latency constraints and tighter failure handling.
D‑ID published the V4 launch materials on March 16, 2026 and confirmed V4 Expressive Visual Agents are available across all D‑ID plans starting at $5.90/month. (prnewswire.com) The V4 tech spec lists end‑to‑end conversational latency under 500 ms, model latency under 120 ms, a 200+ FPS diffusion rendering pipeline, and a lip‑sync score of 5.7 LSE‑D (17% better than D‑ID’s V3). (d-id.com) D‑ID’s Agents product surfaces a Client SDK that handles WebRTC streaming for real‑time sessions and an Agents SDK for embedding agents, with an official demo repo and an npm client package for front‑end integration. (docs.d-id.com) Conversation telemetry is exportable as zipped JSON chat logs via the Chat Exports API (exports are retrievable by export_id and removed seven days after creation), and the D‑ID blog explicitly notes “measurable by default” exports for analytics and QA. (docs.d-id.com) The product tech spec embeds multimodal primitives—an “Eyesight” vision‑enabled LLM that analyzes frames and a Generative UI that can render media assets dynamically—forcing orchestration to coordinate vision frames, LLM calls, and the 200+ FPS renderer for correct turn‑taking. (d-id.com) D‑ID positions enterprise adoption through an enterprise page that offers onboarding meetings and discounts while the quickstart docs highlight zero‑backend embeddable agents and the ability to configure or bring your own LLM provider—features that directly affect platform integration and developer experience. (d-id.com)