Notion Scales Vector Search 10x at 1/10th the Cost

Published February 20, 2026 by The Daily Scout

Notion has achieved a tenfold increase in the scale of its vector search capabilities over two years, while simultaneously reducing costs by a factor of ten. The AI-powered infrastructure enhances document and note retrieval for users. This case study demonstrates the value of investing in a scalable AI backend to support user-facing features like semantic search and personalized recommendations.

Why it matters

- Notion's vector search is a core component of its Q&A feature, which launched in beta in November 2023. This feature allows users to ask natural-language questions and receive answers synthesized from their workspace content. - To handle the scale of millions of workspaces, Notion's initial architecture used a dual-path ingestion system with batch jobs on Apache Spark for existing documents and real-time updates via Kafka for new edits. - A key cost-saving move was migrating the embeddings workload in May 2024 from a dedicated-hardware "pod" architecture to a new serverless one that decoupled storage and compute, leading to a 50% cost reduction from peak usage. - Further cost optimization involves migrating the embeddings generation pipeline from Spark to Ray, which is projected to reduce infrastructure costs for that component by over 90%. - The system currently manages over 10 billion "chunks" of text from millions of workspaces, including connected applications like Slack and Google Drive. - For its vector database, Notion partners with turbopuffer, a serverless search engine built on object storage, to manage the billions of vectors at scale. - The search process involves sending a user's query to an embedding model, which then queries the vector database with metadata filters to ensure users only see content they have permission to access. - The engineering team achieved a 60% cost reduction on search engine spending and improved the median production query latency from a range of 70-100ms to 50-70ms.

Key numbers

- Notion's vector search is a core component of its Q&A feature, which launched in beta in November 2023.
A key cost-saving move was migrating the embeddings workload in May 2024 from a dedicated-hardware "pod" architecture to a new serverless one that decoupled storage and compute, leading to a 50% cost reduction from peak usage.
Further cost optimization involves migrating the embeddings generation pipeline from Spark to Ray, which is projected to reduce infrastructure costs for that component by over 90%.
The system currently manages over 10 billion "chunks" of text from millions of workspaces, including connected applications like Slack and Google Drive.

What happens next

A key cost-saving move was migrating the embeddings workload in May 2024 from a dedicated-hardware "pod" architecture to a new serverless one that decoupled storage and compute, leading to a 50% cost reduction from peak usage.

Sources

Quick answers

What happened in Notion Scales Vector Search 10x at 1/10th the Cost?

Why does Notion Scales Vector Search 10x at 1/10th the Cost matter?

Notion's vector search is a core component of its Q&A feature, which launched in beta in November 2023. This feature allows users to ask natural-language questions and receive answers synthesized from their workspace content. To handle the scale of millions of workspaces, Notion's initial architecture used a dual-path ingestion system with batch jobs on Apache Spark for existing documents and real-time updates via Kafka for new edits. A key cost-saving move was migrating the embeddings workload in May 2024 from a dedicated-hardware "pod" architecture to a new serverless one that decoupled storage and compute, leading to a 50% cost reduction from peak usage. Further cost optimization involves migrating the embeddings generation pipeline from Spark to Ray, which is projected to reduce infrastructure costs for that component by over 90%. The system currently manages over 10 billion "chunks" of text from millions of workspaces, including connected applications like Slack and Google Drive. For its vector database, Notion partners with turbopuffer, a serverless search engine built on object storage, to manage the billions of vectors at scale. The search process involves sending a user's query to an embedding model, which then queries the vector database with metadata filters to ensure users only see content they have permission to access. The engineering team achieved a 60% cost reduction on search engine spending and improved the median production query latency from a range of 70-100ms to 50-70ms.