PageIndex skips vector DBs
PageIndex abandons vector DBs for a hierarchical tree index inspired by AlphaGo and reports 98.7% on FinanceBench versus Perplexity’s 45%, claiming better traceable retrieval and multi‑step reasoning for enterprise docs like financial reports. The approach is being touted as an open‑source, production‑scale alternative to vector stores. (x.com)
VectifyAI’s PageIndex repository shows heavy community traction — the GitHub project lists roughly 23.5k stars and hundreds of commits with active updates in the last week. (github.com) The codebase ingests documents into a nested "Global Index" that represents chapters, sections, pages and line pointers, then exposes a two‑phase workflow: structured tree generation followed by LLM‑driven tree search to classify and navigate nodes. (pageindex.ai) A Colab cookbook and example agents demonstrate the pattern end‑to‑end, and the repo includes a run_pageindex script that adds multi‑provider LLM support (LiteLLM) and examples using OpenAI Agents for agentic retrieval. (colab.research.google.com) The Mafin2.5 evaluation repo publishes its benchmark code and human‑annotation protocol, showing model‑agnostic evaluation across base LLMs and noting comparable evaluation runs on ChatGPT 4o and Deepseek v3. (github.com) Product pages for the stack list integrations built for finance workflows — ingested SEC filings, earnings‑call transcripts and live tickers covering Russell 3000/Nasdaq — as well as API, MCP and self‑hosted deployment options for enterprise use. (pageindex.ai) Operational tradeoffs are already visible: community threads request latency baselines for tree generation and several posts document mitigation strategies, including a community optimization that used a meta‑index pre‑filter, parallel tree search and in‑memory caching to reduce query latency by ~2.6×. (github.com)