Notion Details 90% Cost Reduction in Scaling AI Search

Notion has detailed its two-year effort to scale its vector search capabilities for its AI Q&A feature, achieving a 10x increase in scale while cutting infrastructure costs by 90%. The case study provides a look into the operational and financial considerations for creative and technology companies building scalable, efficient AI-driven systems. It highlights the importance of optimizing AI back-ends for both speed and cost.

- Notion's AI Q&A feature, which provides natural-language answers from a user's workspace, was officially launched in November 2023. The underlying vector search infrastructure had to be scaled to accommodate millions of workspaces. - The initial architecture for the AI search involved a dual-path ingestion and indexing pipeline using Apache Spark for batch processing and Kafka for real-time updates. This system was designed to efficiently onboard large existing workspaces while keeping live data current with sub-minute latency. - A significant cost reduction of 50% was achieved in May 2024 by migrating the embeddings workload from a dedicated-hardware "pod" architecture to a new serverless architecture. This change, which decoupled storage from compute, resulted in several million dollars in annual savings. - To further optimize, Notion began migrating its embeddings generation pipeline from Spark to Ray in July 2025, a move projected to reduce infrastructure costs for embeddings by over 90%. Ray Serve is used to host open-source embedding models on GPUs for low-latency query embedding. - Notion also optimized its process for updating page edits by comparing text and metadata hashes. This allows the system to skip the expensive re-embedding process when only metadata changes, instead issuing a much cheaper PATCH command to the vector database, which reduced data volume by 70%. - The company collaborates with a vector database provider called turbopuffer to manage its over 10 billion text chunks from millions of workspaces. This partnership is on track to save Notion several million dollars annually on its vector database. - For some smaller workspaces with fewer than 1,000 documents, Notion utilizes Cohere Rerank to improve search relevance, which in some cases avoids the need for vector search and embeddings altogether, further reducing complexity and cost. - The Notion AI add-on, which includes the Q&A feature, is available for paid plans at a cost of $8 per member per month (billed annually). This pricing strategy allows the company to generate incremental revenue from its AI features without altering its core subscription tiers.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.