{"id":80042,"date":"2026-03-30T14:46:40","date_gmt":"2026-03-30T14:46:40","guid":{"rendered":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/"},"modified":"2026-03-30T14:46:40","modified_gmt":"2026-03-30T14:46:40","slug":"salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x","status":"publish","type":"post","link":"https:\/\/youzum.net\/zh\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/","title":{"rendered":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x"},"content":{"rendered":"<p>In the world of voice AI, the difference between a helpful assistant and an awkward interaction is measured in milliseconds. While text-based Retrieval-Augmented Generation (RAG) systems can afford a few seconds of \u2018thinking\u2019 time, voice agents must respond within a 200<em>ms<\/em> budget to maintain a natural conversational flow. Standard production vector database queries typically add 50-300<em>ms<\/em> of network latency, effectively consuming the entire budget before an LLM even begins generating a response.<\/p>\n<p>Salesforce AI research team has released <strong>VoiceAgentRAG<\/strong>, an open-source dual-agent architecture designed to bypass this retrieval bottleneck by decoupling document fetching from response generation.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1750\" height=\"1344\" data-attachment-id=\"78700\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/30\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/screenshot-2026-03-30-at-2-49-13-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1.png\" data-orig-size=\"1750,1344\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-30 at 2.49.13\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-300x230.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-1024x786.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1.png\" alt=\"\" class=\"wp-image-78700\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.02206<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Dual-Agent Architecture: Fast Talker vs. Slow Thinker<\/strong><\/h3>\n<p><strong>VoiceAgentRAG operates as a memory router that orchestrates two concurrent agents via an asynchronous event bus:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>The Fast Talker (Foreground Agent):<\/strong> This agent handles the critical latency path. For every user query, it first checks a local, in-memory <strong>Semantic Cache<\/strong>. If the required context is present, the lookup takes approximately 0.35<em>ms<\/em>. On a cache miss, it falls back to the remote vector database and immediately caches the results for future turns.<\/li>\n<li><strong>The Slow Thinker (Background Agent):<\/strong> Running as a background task, this agent continuously monitors the conversation stream. It uses a sliding window of the <strong>last six conversation turns<\/strong> to predict <strong>3\u20135 likely follow-up topics<\/strong>. It then pre-fetches relevant document chunks from the remote vector store into the local cache before the user even speaks their next question.<\/li>\n<\/ul>\n<p>To optimize search accuracy, the Slow Thinker is instructed to generate <strong>document-style descriptions<\/strong> rather than questions<sup><\/sup>. This ensures the resulting embeddings align more closely with the actual prose found in the knowledge base<sup><\/sup>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Technical Backbone: Semantic Caching<\/strong><\/h3>\n<p>The system\u2019s efficiency hinges on a specialized semantic cache implemented with an in-memory <strong>FAISS IndexFlat IP<\/strong> (inner product)<sup><\/sup><sup><\/sup><sup><\/sup><sup><\/sup>.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Document-Embedding Indexing:<\/strong> Unlike passive caches that index by query meaning, VoiceAgentRAG indexes entries by their own <strong>document embeddings<\/strong>. This allows the cache to perform a proper semantic search over its contents, ensuring relevance even if the user\u2019s phrasing differs from the system\u2019s predictions.<\/li>\n<li><strong>Threshold Management:<\/strong> Because query-to-document cosine similarity is systematically lower than query-to-query similarity, the system uses a default threshold of \u03c4=0.40tau = 0.40 to balance precision and recall.<\/li>\n<li><strong>Maintenance:<\/strong> The cache detects near-duplicates using a <strong>0.95 cosine similarity threshold<\/strong> and employs a <strong>Least Recently Used (LRU)<\/strong> eviction policy with a <strong>300-second Time-To-Live (TTL)<\/strong>.<\/li>\n<li><strong>Priority Retrieval:<\/strong> On a Fast Talker cache miss, a <code>PriorityRetrieval<\/code> event triggers the Slow Thinker to perform an immediate retrieval with an <strong>expanded top-k (2x the default)<\/strong> to rapidly populate the cache around the new topic area.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Benchmarks and Performance<\/strong><\/h3>\n<p>The research team evaluated the system using <strong>Qdrant Cloud<\/strong> as a remote vector database across 200 queries and 10 conversation scenarios.<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Metric<\/strong><\/td>\n<td><strong>Performance<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Overall Cache Hit Rate<\/strong><\/td>\n<td>75% (79% on warm turns)<\/td>\n<\/tr>\n<tr>\n<td><strong>Retrieval Speedup<\/strong><\/td>\n<td>316x (110ms\u21920.35ms)(110ms rightarrow 0.35ms)<\/td>\n<\/tr>\n<tr>\n<td><strong>Total Retrieval Time Saved<\/strong><\/td>\n<td>16.5 seconds over 200 turns<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>The architecture is most effective in topically coherent or sustained-topic scenarios. For example, <strong>\u2018Feature comparison\u2019 (S8)<\/strong> achieved a <strong>95% hit rate<\/strong>. Conversely, performance dipped in more volatile scenarios; the lowest-performing scenario was <strong>\u2018Existing customer upgrade\u2019 (S9)<\/strong> at a <strong>45% hit rate<\/strong>, while \u2018Mixed rapid-fire\u2019 (S10) maintained 55%.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" width=\"1384\" height=\"492\" data-attachment-id=\"78702\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/03\/30\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/screenshot-2026-03-30-at-2-50-14-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.50.14-AM-1.png\" data-orig-size=\"1384,492\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-03-30 at 2.50.14\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.50.14-AM-1-300x107.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.50.14-AM-1-1024x364.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.50.14-AM-1.png\" alt=\"\" class=\"wp-image-78702\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2603.02206<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Integration and Support<\/strong><\/h3>\n<p><strong>The VoiceAgentRAG repository is designed for broad compatibility across the AI stack:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>LLM Providers:<\/strong> Supports <strong>OpenAI<\/strong>, <strong>Anthropic<\/strong>, <strong>Gemini\/Vertex AI<\/strong>, and <strong>Ollama<\/strong>. The paper\u2019s default evaluation model was <strong>GPT-4o-mini<\/strong>.<\/li>\n<li><strong>Embeddings:<\/strong> The research utilized <strong>OpenAI text-embedding-3-small<\/strong> (1536 dimensions), but the repository provides support for both <strong>OpenAI<\/strong> and <strong>Ollama<\/strong> embeddings.<\/li>\n<li><strong>STT\/TTS:<\/strong> Supports <strong>Whisper<\/strong> (local or OpenAI) for speech-to-text and <strong>Edge TTS<\/strong> \u6216\u8005 <strong>OpenAI<\/strong> for text-to-speech.<\/li>\n<li><strong>Vector Stores:<\/strong> Built-in support for <strong>FAISS<\/strong> and <strong>Qdrant<\/strong>.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Dual-Agent Architecture<\/strong>: The system solves the RAG latency bottleneck by using a foreground \u2018Fast Talker\u2019 for sub-millisecond cache lookups and a background \u2018Slow Thinker\u2019 for predictive pre-fetching.<\/li>\n<li><strong>Significant Speedup<\/strong>: It achieves a 316x retrieval speedup (110ms\u21920.35ms)(110ms rightarrow 0.35ms) on cache hits, which is critical for staying within the natural 200ms voice response budget.<\/li>\n<li><strong>High Cache Efficiency<\/strong>: Across diverse scenarios, the system maintains a 75% overall cache hit rate, peaking at 95% in topically coherent conversations like feature comparisons.<\/li>\n<li><strong>Document-Indexed Caching<\/strong>: To ensure accuracy regardless of user phrasing, the semantic cache indexes entries by document embeddings rather than the predicted query\u2019s embedding.<\/li>\n<li><strong>Anticipatory Prefetching<\/strong>: The background agent uses a sliding window of the last 6 conversation turns to predict likely follow-up topics and populate the cache during natural inter-turn pauses.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2603.02206\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a><\/strong> and <strong><a href=\"https:\/\/github.com\/SalesforceAIResearch\/VoiceAgentRAG\" target=\"_blank\" rel=\"noreferrer noopener\">Repo here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/30\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\">Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In the world of voice AI, the difference between a helpful assistant and an awkward interaction is measured in milliseconds. While text-based Retrieval-Augmented Generation (RAG) systems can afford a few seconds of \u2018thinking\u2019 time, voice agents must respond within a 200ms budget to maintain a natural conversational flow. Standard production vector database queries typically add 50-300ms of network latency, effectively consuming the entire budget before an LLM even begins generating a response. Salesforce AI research team has released VoiceAgentRAG, an open-source dual-agent architecture designed to bypass this retrieval bottleneck by decoupling document fetching from response generation. https:\/\/arxiv.org\/pdf\/2603.02206 The Dual-Agent Architecture: Fast Talker vs. Slow Thinker VoiceAgentRAG operates as a memory router that orchestrates two concurrent agents via an asynchronous event bus: The Fast Talker (Foreground Agent): This agent handles the critical latency path. For every user query, it first checks a local, in-memory Semantic Cache. If the required context is present, the lookup takes approximately 0.35ms. On a cache miss, it falls back to the remote vector database and immediately caches the results for future turns. The Slow Thinker (Background Agent): Running as a background task, this agent continuously monitors the conversation stream. It uses a sliding window of the last six conversation turns to predict 3\u20135 likely follow-up topics. It then pre-fetches relevant document chunks from the remote vector store into the local cache before the user even speaks their next question. To optimize search accuracy, the Slow Thinker is instructed to generate document-style descriptions rather than questions. This ensures the resulting embeddings align more closely with the actual prose found in the knowledge base. The Technical Backbone: Semantic Caching The system\u2019s efficiency hinges on a specialized semantic cache implemented with an in-memory FAISS IndexFlat IP (inner product). Document-Embedding Indexing: Unlike passive caches that index by query meaning, VoiceAgentRAG indexes entries by their own document embeddings. This allows the cache to perform a proper semantic search over its contents, ensuring relevance even if the user\u2019s phrasing differs from the system\u2019s predictions. Threshold Management: Because query-to-document cosine similarity is systematically lower than query-to-query similarity, the system uses a default threshold of \u03c4=0.40tau = 0.40 to balance precision and recall. Maintenance: The cache detects near-duplicates using a 0.95 cosine similarity threshold and employs a Least Recently Used (LRU) eviction policy with a 300-second Time-To-Live (TTL). Priority Retrieval: On a Fast Talker cache miss, a PriorityRetrieval event triggers the Slow Thinker to perform an immediate retrieval with an expanded top-k (2x the default) to rapidly populate the cache around the new topic area. Benchmarks and Performance The research team evaluated the system using Qdrant Cloud as a remote vector database across 200 queries and 10 conversation scenarios. Metric Performance Overall Cache Hit Rate 75% (79% on warm turns) Retrieval Speedup 316x (110ms\u21920.35ms)(110ms rightarrow 0.35ms) Total Retrieval Time Saved 16.5 seconds over 200 turns The architecture is most effective in topically coherent or sustained-topic scenarios. For example, \u2018Feature comparison\u2019 (S8) achieved a 95% hit rate. Conversely, performance dipped in more volatile scenarios; the lowest-performing scenario was \u2018Existing customer upgrade\u2019 (S9) at a 45% hit rate, while \u2018Mixed rapid-fire\u2019 (S10) maintained 55%. https:\/\/arxiv.org\/pdf\/2603.02206 Integration and Support The VoiceAgentRAG repository is designed for broad compatibility across the AI stack: LLM Providers: Supports OpenAI, Anthropic, Gemini\/Vertex AI, and Ollama. The paper\u2019s default evaluation model was GPT-4o-mini. Embeddings: The research utilized OpenAI text-embedding-3-small (1536 dimensions), but the repository provides support for both OpenAI and Ollama embeddings. STT\/TTS: Supports Whisper (local or OpenAI) for speech-to-text and Edge TTS or OpenAI for text-to-speech. Vector Stores: Built-in support for FAISS and Qdrant. Key Takeaways Dual-Agent Architecture: The system solves the RAG latency bottleneck by using a foreground \u2018Fast Talker\u2019 for sub-millisecond cache lookups and a background \u2018Slow Thinker\u2019 for predictive pre-fetching. Significant Speedup: It achieves a 316x retrieval speedup (110ms\u21920.35ms)(110ms rightarrow 0.35ms) on cache hits, which is critical for staying within the natural 200ms voice response budget. High Cache Efficiency: Across diverse scenarios, the system maintains a 75% overall cache hit rate, peaking at 95% in topically coherent conversations like feature comparisons. Document-Indexed Caching: To ensure accuracy regardless of user phrasing, the semantic cache indexes entries by document embeddings rather than the predicted query\u2019s embedding. Anticipatory Prefetching: The background agent uses a sliding window of the last 6 conversation turns to predict likely follow-up topics and populate the cache during natural inter-turn pauses. Check out\u00a0the\u00a0Paper and Repo here.\u00a0Also,\u00a0feel free to follow us on\u00a0Twitter\u00a0and don\u2019t forget to join our\u00a0120k+ ML SubReddit\u00a0and Subscribe to\u00a0our Newsletter. Wait! are you on telegram?\u00a0now you can join us on telegram as well. The post Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x appeared first on MarkTechPost.<\/p>","protected":false},"author":2,"featured_media":80043,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"pmpro_default_level":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_pvb_checkbox_block_on_post":false,"footnotes":""},"categories":[52,5,7,1],"tags":[],"class_list":["post-80042","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-club","category-committee","category-news","category-uncategorized","pmpro-has-access"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum<\/title>\n<meta name=\"description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/youzum.net\/zh\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum\" \/>\n<meta property=\"og:description\" content=\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\" \/>\n<meta property=\"og:url\" content=\"https:\/\/youzum.net\/zh\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\" \/>\n<meta property=\"og:site_name\" content=\"YouZum\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/DroneAssociationTH\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-30T14:46:40+00:00\" \/>\n<meta name=\"author\" content=\"admin NU\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin NU\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\"},\"author\":{\"name\":\"admin NU\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\"},\"headline\":\"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x\",\"datePublished\":\"2026-03-30T14:46:40+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\"},\"wordCount\":809,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png\",\"articleSection\":[\"AI\",\"Committee\",\"News\",\"Uncategorized\"],\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\",\"url\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\",\"name\":\"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum\",\"isPartOf\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png\",\"datePublished\":\"2026-03-30T14:46:40+00:00\",\"description\":\"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19\",\"breadcrumb\":{\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png\",\"width\":1750,\"height\":1344},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/youzum.net\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/yousum.gpucore.co\/#website\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"name\":\"YouSum\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/yousum.gpucore.co\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/yousum.gpucore.co\/#organization\",\"name\":\"Drone Association Thailand\",\"url\":\"https:\/\/yousum.gpucore.co\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png\",\"width\":300,\"height\":300,\"caption\":\"Drone Association Thailand\"},\"image\":{\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/DroneAssociationTH\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c\",\"name\":\"admin NU\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"contentUrl\":\"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png\",\"caption\":\"admin NU\"},\"url\":\"https:\/\/youzum.net\/zh\/members\/adminnu\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/youzum.net\/zh\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/","og_locale":"zh_CN","og_type":"article","og_title":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum","og_description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","og_url":"https:\/\/youzum.net\/zh\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/","og_site_name":"YouZum","article_publisher":"https:\/\/www.facebook.com\/DroneAssociationTH\/","article_published_time":"2026-03-30T14:46:40+00:00","author":"admin NU","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"admin NU","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"4 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#article","isPartOf":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/"},"author":{"name":"admin NU","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c"},"headline":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x","datePublished":"2026-03-30T14:46:40+00:00","mainEntityOfPage":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/"},"wordCount":809,"commentCount":0,"publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"image":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png","articleSection":["AI","Committee","News","Uncategorized"],"inLanguage":"zh-Hans","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/","url":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/","name":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x - YouZum","isPartOf":{"@id":"https:\/\/yousum.gpucore.co\/#website"},"primaryImageOfPage":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage"},"image":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage"},"thumbnailUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png","datePublished":"2026-03-30T14:46:40+00:00","description":"\u0e01\u0e34\u0e08\u0e01\u0e23\u0e23\u0e21\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e42\u0e14\u0e23\u0e19","breadcrumb":{"@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/"]}]},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#primaryimage","url":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png","width":1750,"height":1344},{"@type":"BreadcrumbList","@id":"https:\/\/youzum.net\/salesforce-ai-research-releases-voiceagentrag-a-dual-agent-memory-router-that-cuts-voice-rag-retrieval-latency-by-316x\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/youzum.net\/"},{"@type":"ListItem","position":2,"name":"Salesforce AI Research Releases VoiceAgentRAG: A Dual-Agent Memory Router that Cuts Voice RAG Retrieval Latency by 316x"}]},{"@type":"WebSite","@id":"https:\/\/yousum.gpucore.co\/#website","url":"https:\/\/yousum.gpucore.co\/","name":"YouSum","description":"","publisher":{"@id":"https:\/\/yousum.gpucore.co\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/yousum.gpucore.co\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"zh-Hans"},{"@type":"Organization","@id":"https:\/\/yousum.gpucore.co\/#organization","name":"Drone Association Thailand","url":"https:\/\/yousum.gpucore.co\/","logo":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/2024\/11\/tranparent-logo.png","width":300,"height":300,"caption":"Drone Association Thailand"},"image":{"@id":"https:\/\/yousum.gpucore.co\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/DroneAssociationTH\/"]},{"@type":"Person","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/97fa48242daf3908e4d9a5f26f4a059c","name":"admin NU","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/yousum.gpucore.co\/#\/schema\/person\/image\/","url":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","contentUrl":"https:\/\/youzum.net\/wp-content\/uploads\/avatars\/2\/1746849356-bpfull.png","caption":"admin NU"},"url":"https:\/\/youzum.net\/zh\/members\/adminnu\/"}]}},"rttpg_featured_image_url":{"full":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png",1750,1344,false],"landscape":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png",1750,1344,false],"portraits":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png",1750,1344,false],"thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-150x150.png",150,150,true],"medium":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-300x230.png",300,230,true],"large":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-1024x786.png",1024,786,true],"1536x1536":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-1536x1180.png",1536,1180,true],"2048x2048":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l.png",1750,1344,false],"trp-custom-language-flag":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-16x12.png",16,12,true],"woocommerce_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-300x300.png",300,300,true],"woocommerce_single":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-600x461.png",600,461,true],"woocommerce_gallery_thumbnail":["https:\/\/youzum.net\/wp-content\/uploads\/2026\/03\/Screenshot-2026-03-30-at-2.49.13-AM-1-zOkB8l-100x100.png",100,100,true]},"rttpg_author":{"display_name":"admin NU","author_link":"https:\/\/youzum.net\/zh\/members\/adminnu\/"},"rttpg_comment":0,"rttpg_category":"<a href=\"https:\/\/youzum.net\/zh\/category\/ai-club\/\" rel=\"category tag\">AI<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/committee\/\" rel=\"category tag\">Committee<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/news\/\" rel=\"category tag\">News<\/a> <a href=\"https:\/\/youzum.net\/zh\/category\/uncategorized\/\" rel=\"category tag\">Uncategorized<\/a>","rttpg_excerpt":"In the world of voice AI, the difference between a helpful assistant and an awkward interaction is measured in milliseconds. While text-based Retrieval-Augmented Generation (RAG) systems can afford a few seconds of \u2018thinking\u2019 time, voice agents must respond within a 200ms budget to maintain a natural conversational flow. Standard production vector database queries typically add&hellip;","_links":{"self":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/80042","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/comments?post=80042"}],"version-history":[{"count":0,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/posts\/80042\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media\/80043"}],"wp:attachment":[{"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/media?parent=80042"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/categories?post=80042"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/youzum.net\/zh\/wp-json\/wp\/v2\/tags?post=80042"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}