Voice AI latency slashed
What happened
Salesforce’s new VoiceAgentRAG — a twin‑agent reminiscence router — reportedly cuts voice RAG retrieval latency by 316x, promising much faster, more natural AI‑driven sales calls and support interactions. Faster retrieval in voice workflows could materially change how companies deploy conversational agents in high‑volume sales processes. (blog.aimactgrow.com)
Why it matters
Salesforce AI Research published the VoiceAgentRAG paper on arXiv on March 2, 2026 and simultaneously released an open-source implementation on GitHub under SalesforceAIResearch/VoiceAgentRAG. (arxiv.org) The system routes work between a foreground “Fast Talker” that serves the latency-critical path from a local semantic cache and a background “Slow Thinker” that runs asynchronously to prefetch context via an event bus. (arxiv.org) The foreground cache returns lookups in roughly 0.35 milliseconds on hit, while typical vector-database queries add roughly 50–300 ms to pipelines; voice research benchmarks target a sub-200 ms total response budget for natural conversation. (marktechpost.com) The Slow Thinker continuously scans the last six conversation turns, uses an LLM to predict 3–5 likely follow-up topics, and pre-fetches the corresponding document chunks into a FAISS-backed semantic cache to maximize cache hits. (arxiv.org) On cache misses the Fast Talker falls back to a remote vector database, then immediately inserts fetched chunks into the cache so subsequent turns hit the sub-millisecond store rather than repeat remote queries. (github.com) The paper lists authors including Jielin Qiu, Jianguo Zhang, Zixiang Chen and others from Salesforce AI Research, and the release has drawn immediate coverage across specialist outlets as the team positioned the design for real-time, enterprise voice agents. (arxiv.org)
Key numbers
- Salesforce’s new VoiceAgentRAG — a twin‑agent reminiscence router — reportedly cuts voice RAG retrieval latency by 316x, promising much faster, more natural AI‑driven sales calls and support interactions.
- (blog.aimactgrow.com) Salesforce AI Research published the VoiceAgentRAG paper on arXiv on March 2, 2026 and simultaneously released an open-source implementation on GitHub under SalesforceAIResearch/VoiceAgentRAG.
- (arxiv.org) The foreground cache returns lookups in roughly 0.35 milliseconds on hit, while typical vector-database queries add roughly 50–300 ms to pipelines; voice research benchmarks target a sub-200 ms total response budget for natural conversation.
- (marktechpost.com) The Slow Thinker continuously scans the last six conversation turns, uses an LLM to predict 3–5 likely follow-up topics, and pre-fetches the corresponding document chunks into a FAISS-backed semantic cache to maximize cache hits.
What happens next
- (arxiv.org) The foreground cache returns lookups in roughly 0.35 milliseconds on hit, while typical vector-database queries add roughly 50–300 ms to pipelines; voice research benchmarks target a sub-200 ms total response budget for natural conversation.
- Faster retrieval in voice workflows could materially change how companies deploy conversational agents in high‑volume sales processes.
Quick answers
What happened in Voice AI latency slashed?
Salesforce’s new VoiceAgentRAG — a twin‑agent reminiscence router — reportedly cuts voice RAG retrieval latency by 316x, promising much faster, more natural AI‑driven sales calls and support interactions. Faster retrieval in voice workflows could materially change how companies deploy conversational agents in high‑volume sales processes. (blog.aimactgrow.com)
Why does Voice AI latency slashed matter?
Salesforce AI Research published the VoiceAgentRAG paper on arXiv on March 2, 2026 and simultaneously released an open-source implementation on GitHub under SalesforceAIResearch/VoiceAgentRAG. (arxiv.org) The system routes work between a foreground “Fast Talker” that serves the latency-critical path from a local semantic cache and a background “Slow Thinker” that runs asynchronously to prefetch context via an event bus. (arxiv.org) The foreground cache returns lookups in roughly 0.35 milliseconds on hit, while typical vector-database queries add roughly 50–300 ms to pipelines; voice research benchmarks target a sub-200 ms total response budget for natural conversation. (marktechpost.com) The Slow Thinker continuously scans the last six conversation turns, uses an LLM to predict 3–5 likely follow-up topics, and pre-fetches the corresponding document chunks into a FAISS-backed semantic cache to maximize cache hits. (arxiv.org) On cache misses the Fast Talker falls back to a remote vector database, then immediately inserts fetched chunks into the cache so subsequent turns hit the sub-millisecond store rather than repeat remote queries. (github.com) The paper lists authors including Jielin Qiu, Jianguo Zhang, Zixiang Chen and others from Salesforce AI Research, and the release has drawn immediate coverage across specialist outlets as the team positioned the design for real-time, enterprise voice agents. (arxiv.org)