Vector DB cost vs latency
- A benchmark compared Pinecone to pgvector on a 4M‑chunk dataset at 1,500 QPS for RAG-style workloads. - Pinecone ran about $1,800/month with 180ms p95 latency, while pgvector+hnsw ran about $320/month with 210ms p95 at similar recall. - The test suggests pgvector is more cost‑efficient for 90% RAG use cases unless ultra‑low latency is required. (x.com)
A vector database is the part of an artificial intelligence stack that stores embeddings — long lists of numbers that let software find similar text by meaning instead of exact words. In one recent benchmark, PostgreSQL with the pgvector extension came close to Pinecone on speed while costing far less on a retrieval-augmented generation workload. (github.com) (x.com) The test used a 4 million-chunk dataset and drove about 1,500 queries per second, a production-style load for chatbots that fetch relevant passages before answering. On that setup, Pinecone posted about 180 millisecond p95 latency at roughly $1,800 a month, while pgvector with a hierarchical navigable small world, or HNSW, index came in around 210 milliseconds at roughly $320 a month with similar recall. (x.com) Retrieval-augmented generation, or RAG, works like open-book exam software: it looks up passages first, then sends those passages to the model. That makes the retrieval layer a direct cost and latency bottleneck, especially once traffic moves from demos to thousands of requests a second. (github.com) (pinecone.io) Pinecone sells a managed vector database built for this job, while pgvector adds vector search to PostgreSQL, the general-purpose database many teams already run. Pinecone says its current architecture is serverless and recommends it for new projects, while its older pod-based indexes are now legacy infrastructure. (pinecone.io 1) (pinecone.io 2) Pgvector’s recent gains come from better indexing. The project added HNSW support in version 0.5.0 in September 2023, and PostgreSQL.org said the graph-based index improved the speed-recall tradeoff over older approximate search methods such as IVFFlat. (postgresql.org) (crunchydata.com) That does not make the choice one-sided. HNSW usually uses more memory and takes longer to build than IVFFlat, and Pinecone says query latency depends heavily on running clients in the same cloud and region as the index. (pgxn.org) (docs.pinecone.io) Earlier vendor-backed tests pointed in a similar direction, though with different datasets and price points. Supabase said in an October 2023 comparison that pgvector delivered higher throughput than Pinecone at a roughly comparable monthly budget, while Tiger Data argued in June 2024 that PostgreSQL with pgvector and pgvectorscale could outperform Pinecone on performance and cost. (supabase.com) (tigerdata.com) The practical split is getting clearer: teams that already run PostgreSQL can often keep retrieval in the same database and save money, while teams chasing the lowest possible latency or fully managed operations may still pay for a dedicated service. The benchmark’s gap — about 30 milliseconds at p95 versus roughly $1,480 a month in added cost — is the tradeoff engineers are now pricing in. (x.com)