Voice AI latency slashed
Salesforce’s new VoiceAgentRAG — a twin‑agent reminiscence router — reportedly cuts voice RAG retrieval latency by 316x, promising much faster, more natural AI‑driven sales calls and support interactions. Faster retrieval in voice workflows could materially change how companies deploy conversational agents in high‑volume sales processes. (blog.aimactgrow.com)
Salesforce AI Research published the VoiceAgentRAG paper on arXiv on March 2, 2026 and simultaneously released an open-source implementation on GitHub under SalesforceAIResearch/VoiceAgentRAG. (arxiv.org) The system routes work between a foreground “Fast Talker” that serves the latency-critical path from a local semantic cache and a background “Slow Thinker” that runs asynchronously to prefetch context via an event bus. (arxiv.org) The foreground cache returns lookups in roughly 0.35 milliseconds on hit, while typical vector-database queries add roughly 50–300 ms to pipelines; voice research benchmarks target a sub-200 ms total response budget for natural conversation. (marktechpost.com) The Slow Thinker continuously scans the last six conversation turns, uses an LLM to predict 3–5 likely follow-up topics, and pre-fetches the corresponding document chunks into a FAISS-backed semantic cache to maximize cache hits. (arxiv.org) On cache misses the Fast Talker falls back to a remote vector database, then immediately inserts fetched chunks into the cache so subsequent turns hit the sub-millisecond store rather than repeat remote queries. (github.com) The paper lists authors including Jielin Qiu, Jianguo Zhang, Zixiang Chen and others from Salesforce AI Research, and the release has drawn immediate coverage across specialist outlets as the team positioned the design for real-time, enterprise voice agents. (arxiv.org)