LEANN Project Enables RAG on a Laptop
A new open-source project called LEANN enables scalable RAG setups for millions of documents to run on a standard laptop. The system uses graph pruning and on-demand embeddings to reduce storage requirements by a reported 97%, offering a privacy-focused alternative to cloud-based vector databases.
The project's core technical insight is that in many RAG applications, particularly on personal devices, storage is a more significant bottleneck than latency. LEANN flips the traditional model by trading storage for compute, recomputing embeddings on-demand during inference instead of pre-calculating and storing them. This is achieved by discarding the embeddings themselves and retaining only a pruned proximity graph representing the relationships between data chunks. The pruning strategy preserves high-degree nodes to maintain connectivity while limiting the out-edges for less-connected nodes, ensuring the essential graph structure remains intact for accurate retrieval. The storage savings are substantial: LEANN can index 60 million text chunks in just 6 GB, whereas traditional vector databases might require over 200 GB for the same dataset. The pruned graph is stored in a Compressed Sparse Row (CSR) format to further minimize the storage and memory footprint. While on-the-fly recomputation introduces some retrieval latency, the developers argue that in modern RAG pipelines, the LLM's generation step is the primary bottleneck. They report that LEANN's approach results in only a ~5% increase in total end-to-end latency. Developed at Berkeley SkyLab and published at MLSys 2026, the project provides specific integrations for developer workflows. For instance, it includes a native MCP server that can act as a drop-in semantic search replacement for the basic grep-style search in Claude Code. LEANN is designed to index a wide array of local data sources, including file systems, Apple Mail, browser history, and codebases, without the data ever leaving the user's machine. This approach supports a completely private and portable "personal AI" that is not dependent on cloud services.