Cursor’s retrieval is the moat

- Cursor said on January 27 it speeds code search by reusing teammates’ existing indexes, arguing semantic retrieval, not model ownership, drives agent performance. - The company said semantic search lifted response accuracy 12.5% on average, while shared indexes cut time-to-first-query from hours to seconds on huge repos. - The claim lands as Cursor, built by Anysphere, pursues a new funding round near $50 billion. (cursor.com) (techcrunch.com)

Before the product story, the plumbing: code assistants work best when they can find the right files fast instead of stuffing an entire repository into a model prompt. Cursor says that retrieval layer — the search system behind its answers — improved agent response accuracy by 12.5% on average. (cursor.com) Cursor described that system in a January 27 engineering post about “securely indexing large codebases.” The company said it builds a searchable index when a project opens, then keeps it current as files change. (cursor.com) For the first pass, Cursor said it maps the repository with a Merkle tree, a hash-based directory map similar to the way Git tracks content. Each file gets a cryptographic hash, and each folder gets a hash derived from its children. (cursor.com) That lets the editor spot exactly which branches changed instead of rescanning everything. In a workspace with 50,000 files, Cursor said filenames plus SHA-256 hashes total about 3.2 megabytes, and the tree means only changed branches need to move. (cursor.com) After that, Cursor said it splits changed files into syntactic chunks and turns those chunks into embeddings, the numerical fingerprints used for semantic search. It also caches embeddings by chunk content so unchanged code does not need to be recomputed. (cursor.com) The newer piece is reuse across teams. Cursor said clones of the same codebase average 92% similarity across users inside an organization, so a new teammate can often inherit most of an existing index instead of rebuilding one from scratch. (cursor.com) Cursor said that shared-index approach cuts time-to-first-query from hours to seconds on the largest repositories. The company also said semantic search is unavailable until at least 80% of naive indexing work is finished, which makes startup time a product problem, not just an infrastructure problem. (cursor.com) That matters in the market Cursor now occupies. TechCrunch reported on April 17 that Anysphere, Cursor’s parent, was nearing a funding round of at least $2 billion at a $50 billion valuation, after reaching $2 billion in annualized revenue in February. (techcrunch.com) The same report said Cursor is trying to rely less on outside model providers as Anthropic’s Claude Code and OpenAI’s Codex push into the same category. That leaves room for a different argument about defensibility: not owning the best base model, but owning the fastest, safest way to assemble the right context around one. (techcrunch.com)

Cursor’s retrieval is the moat

Get your own daily briefing