Full-Duplex Speech AI Runs Locally on Mac
A new developer breakthrough shows the PersonaPlex 7B model running full-duplex, speech-to-speech AI locally on Apple Silicon. The implementation uses Swift and MLX—with no Python or cloud dependency—and achieves faster-than-real-time inference, showcasing the power of on-device processing for complex AI tasks.
The model in question, PersonaPlex-7B, originates from NVIDIA and replaces the traditional, high-latency voice AI pipeline. Instead of chaining separate Automatic Speech Recognition (ASR), Large Language Model (LLM), and Text-to-Speech (TTS) systems, it uses a single, unified transformer model that processes audio directly to audio. This unified architecture is what enables "full-duplex" conversation. The model can listen and speak simultaneously, allowing for natural interruptions, back-and-forth banter, and overlapping speech—dynamics impossible with conventional, turn-based assistants. It processes audio in parallel streams, updating its understanding of the user's speech while generating its own response in real-time. Making this feasible on a laptop required significant model optimization. The original PersonaPlex-7B model is over 16GB and requires a high-end NVIDIA GPU with at least 24GB of VRAM. The version running on Apple Silicon is a 4-bit quantized model, shrinking its size to approximately 5.3GB. The key to the performance is Apple's MLX framework, designed explicitly to exploit the unified memory architecture of Apple Silicon. By allowing the CPU and GPU to access the same memory pool without data transfer, MLX eliminates critical bottlenecks, enabling a multi-billion parameter model to run efficiently on-device. Eschewing Python entirely for a pure Swift implementation is a major strategic statement. It proves the native developer stack can handle demanding, end-to-end AI workloads, reducing reliance on external ecosystems and showcasing the performance potential of compiled Swift code for machine learning. This tight hardware and software integration is a core tenet of Apple's long-term strategy. This breakthrough serves as a powerful proof-of-concept for Apple's privacy-first, on-device AI philosophy. While competitors scale massive cloud-based models, demonstrating this level of complex, real