AWS: keyword search competes

An AWS researcher posted that careful keyword search plus lightweight agentic tools like rga and pdfgrep can achieve RAG-level QA performance without a vector database. (x.com) The claim suggests simpler retrieval pipelines may still be competitive for many QA tasks, at least in some datasets. (x.com)

A new Amazon Web Services paper says a language-model agent using plain keyword search can reach more than 90% of traditional retrieval-augmented generation performance without a standing vector database. (amazon.science) The paper, listed by Amazon Science for AAAI 2026, is titled “Keyword search is all you need” and names eight authors from Amazon Web Services, including Shreyas Subramanian, Adewale Akinfaderin, Yanyan Zhang, Ishan Singh, Chris Pecora, Mani Khanuja, Sandeep Singh, and Maira Ladeira Tanke. (amazon.science) Retrieval-augmented generation is the standard pattern where a model looks up outside documents before answering, and vector databases are the usual filing system because they store numerical representations of text for similarity search. AWS’s own March 2026 prescriptive guidance says vector databases are becoming increasingly important for generative artificial intelligence applications. (docs.aws.amazon.com) The new paper tests a different setup: instead of precomputing embeddings and querying a vector index, the agent gets basic keyword tools and searches documents directly at question time. The abstract says the comparison measured retrieval and answer quality for tool-augmented large language model agents against retrieval-augmented generation systems. (assets.amazon.science) The tools cited in the discussion are stripped-down command-line search programs, not new database infrastructure. ripgrep-all, known as rga, searches inside PDFs, Office files, archives, and other formats, while pdfgrep is a grep-style utility built to search PDF text. (github.com, pdfgrep.org) Amazon Science’s abstract says the keyword-search agent attained “over 90%” of the performance metrics of traditional retrieval-augmented generation. The paper PDF says the system stayed above 88% average attainment across three metrics while generally scoring slightly below the retrieval-augmented generation baseline. (amazon.science, assets.amazon.science) That claim cuts against the default architecture many cloud vendors, including Amazon Web Services, have spent the past year promoting for enterprise question-answering. Amazon OpenSearch Service markets its vector database as a way to search billions of vectors in milliseconds and combine vector embeddings with text keywords in one request. (aws.amazon.com) The paper does not say vector search is obsolete. Its abstract frames the result as a comparison for question-answering with basic keyword tools, and says the simpler approach is especially useful when knowledge bases change frequently. (amazon.science) That leaves a narrower takeaway than the headline many readers may infer: for some document question-answering workloads, careful search plus lightweight tools may get close enough that teams can skip the extra vector layer. Amazon Web Services is now making both cases at once, selling vector systems for scale while publishing evidence that simpler retrieval can still compete. (amazon.science, aws.amazon.com, docs.aws.amazon.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.