NVIDIA Open‑Sources Voice Model
NVIDIA open-sourced a voice AI model intended to support natural, human-like conversations including interruptions and turn-taking. The release was presented as a tool to push boundaries in AI-assisted scripting and production workflows. (x.com)
Most voice assistants wait their turn. NVIDIA’s open-source PersonaPlex model is built to listen and speak at the same time. (research.nvidia.com) That setup is called full duplex: the model can handle interruptions, short listener cues like “uh-huh,” and faster turn-taking instead of forcing a stop-start exchange. NVIDIA described PersonaPlex as a real-time speech-to-speech conversational model with customizable voices and roles. (research.nvidia.com) NVIDIA published the code on GitHub under an MIT license, and the project page says the model was trained on a mix of synthetic and real conversations. The repository describes voice control through speech samples and role control through text prompts. (github.com) The basic problem PersonaPlex tries to solve is latency: many voice systems still run speech recognition, text generation, and text-to-speech one after another. NVIDIA’s research paper says that pipeline makes conversations feel slower and less natural than human dialogue. (research.nvidia.com) PersonaPlex adds a second layer of control on top of that faster exchange. NVIDIA says developers can steer what the assistant sounds like with an audio sample and what role it plays with a written prompt, such as a customer service agent or a fictional character. (research.nvidia.com) NVIDIA’s model card for Nemotron VoiceChat, released on March 16, 2026, points to PersonaPlex as part of a broader push into spoken agents. That card says the company is measuring pause handling, backchanneling, turn-taking, and interruption management as core performance targets for voice systems. (build.nvidia.com) The company has also been packaging those pieces into developer tools. NVIDIA’s Nemotron Voice Agent blueprint says it is designed for streaming, interruptible conversations, while a January 5, 2026 tutorial showed how to add retrieval and safety guardrails to a voice agent stack. (build.nvidia.com) (developer.nvidia.com) That places PersonaPlex in a wider contest over open voice models, where companies are trying to make spoken assistants feel less like phone trees and more like live conversation. NVIDIA’s paper says earlier duplex speech models often locked developers into a fixed voice or role, which limited use in structured applications. (research.nvidia.com) NVIDIA is pitching the release to developers building avatars, agents, and production tools, not as a finished consumer app. The closer test now is whether outside teams use the open code to turn smoother interruptions and turn-taking into products people actually talk to. (github.com)