HNSW Vector Databases Face Scaling Challenges

A new technical analysis warns that adding more documents to HNSW-based vector databases, like those used by Pinecone and Weaviate, can degrade RAG retrieval quality and increase latency. The report details how vector search indexing and neighborhood culling can break down at scale. It recommends mitigation strategies such as parameter tuning, retraining embeddings, and adopting hybrid search for large-scale deployments.

- The core issue with HNSW at scale is that as more vectors are added, the graph becomes denser, leading to a higher probability of taking a "wrong turn" during the search process and requiring more backtracking. This leads to a super-linear increase in latency; one experiment showed that a 20x increase in documents resulted in a 12-13x increase in the amount of work HNSW had to do to maintain quality. - HNSW's performance is highly dependent on being able to hold the entire index in memory, which becomes a significant cost factor at scale. For instance, a dataset with one billion 768-dimensional vectors could require approximately 3TiB of memory for the vectors alone, with the HNSW graph adding another 20-40% on top of that. - Key hyperparameters for tuning HNSW are `M` (the maximum number of connections per node) and `efConstruction` (search depth during indexing), which are set at build time. At query time, `efSearch` (the size of the candidate queue) can be adjusted to balance recall and latency. - For very large datasets, alternatives and hybrid approaches are gaining traction. DiskANN, for example, keeps compressed vectors in RAM while storing full vectors on SSDs to manage memory usage. Another hybrid model combines HNSW with an inverted file index (IVF), using HNSW to find the nearest cluster centroids and then searching only within those clusters. - With fixed HNSW parameters, recall can degrade faster than a simple flat (brute-force) search as the database grows. This is because the approximate nature of the search becomes less reliable in an increasingly crowded high-dimensional space. - Many production systems are moving towards hybrid retrieval that combines dense (vector) search with traditional sparse retrieval methods like BM25. This is because embeddings can lose the detail of exact keywords or product codes, which sparse retrieval methods excel at capturing. - The process of inserting and deleting nodes in an HNSW graph is computationally expensive. Updates can trigger cascading modifications throughout the graph, leading to significant write amplification and making frequent updates inefficient. - Alternatives to HNSW for large-scale vector search include IVF (Inverted File) variants and Product Quantization (PQ). FAISS, a library from Meta, provides implementations of many of these, including IVF-PQ, which significantly reduces memory usage.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.