RAG design shifts toward memory graphs
Google published a practical tutorial on improving RAG pipelines—covering docling, dot‑product efficiency and re‑ranking—while Alibaba’s Tongyi Lab released VimRAG, a multimodal RAG framework that uses a memory graph to handle large visual contexts. Together they point to retrieval systems moving from flat vectors toward more structured context navigation. (x.com/googledevs/status/2042331722298060929, www.marktechpost.com)
Retrieval-augmented generation is the trick that makes a model look things up before it answers, and this week two releases showed that lookup is getting less flat and more structured. (developers.google.com, arxiv.org) In a standard setup, software turns documents into numerical fingerprints called embeddings, finds nearby matches with dot-product math, and then feeds the top chunks into a model. Google’s recent RAG materials focus on those practical steps, including document preprocessing with Docling and a reranking pass that reorders retrieved chunks before generation. (dev.to, developers.google.com) That pipeline works well for text, but it strains when the source material is images, slides, or video, where a lot of tokens may carry very little value for one question. Alibaba’s Tongyi Lab said its VimRAG system replaces a single running history with a dynamic directed acyclic graph, a branching memory structure that tracks which sub-question led to which evidence. (arxiv.org, github.com) Tongyi Lab posted the VimRAG paper to arXiv on February 13, 2026, and the project repository says the demo and retriever are already public while the training code is still under company review. The paper says the framework is built for multimodal retrieval across text, images, and videos. (arxiv.org, github.com) The core problem is memory. A flat list of past steps gets longer every turn, while a compressed summary can hide what the system already searched, so the agent repeats itself or drops useful detail. (arxiv.org, www.marktechpost.com) VimRAG’s answer is to store reasoning as connected nodes instead of one long transcript, then spend more visual tokens on the evidence the graph marks as important. The paper calls that step Graph-Modulated Visual Memory Encoding, which uses a node’s position in the graph to decide what stays detailed and what gets compressed or dropped. (arxiv.org) The paper also changes training. Tongyi Lab’s Graph-Guided Policy Optimization prunes memory nodes tied to redundant actions so the model is rewarded for useful steps instead of every step in a successful run. (arxiv.org, github.com) In one pilot study described by MarkTechPost from the paper, a graph-based memory reduced redundant search actions compared with ReAct-style history and summarization-based memory on a video corpus. In another, the best trade-off came from keeping only semantically related visual memory, using 2.7 thousand tokens and scoring 58.2% on image tasks and 43.7% on video tasks. (www.marktechpost.com) Google’s side of the story is less about a new architecture than about tightening each stage of the classic pipeline. Its current codelabs and developer writeups emphasize chunking choices, vector search, reranking, and managed retrieval services such as Vertex Artificial Intelligence RAG Engine and Gemini File Search. (developers.google.com, cloud.google.com, developers.google.com) Taken together, the releases point to a design split inside retrieval systems: text-heavy workloads still benefit from better parsing, indexing, and reranking, while image- and video-heavy workloads are pushing toward memory graphs that decide not just what to retrieve, but how to navigate context over multiple steps. (dev.to, arxiv.org)