Graph-RAG Framework Launches for Complex Queries

Published by The Daily Scout

What happened

A new open-source Graph-RAG framework has been launched on GitHub to enhance multi-hop reasoning in retrieval systems. The approach uses graph structures to connect disparate pieces of information, enabling more complex and context-aware query answering than traditional vector search.

Why it matters

## Graph-RAG: Under the Hood of the New Open-Source Framework The new Graph-RAG framework, introduced by a team at Microsoft Research led by Jonathan Larson, fundamentally shifts how retrieval systems handle complex queries by creating a knowledge graph from unstructured text. This graph-based data structure allows for multi-hop reasoning, connecting disparate pieces of information in a way that traditional vector search struggles with, leading to more comprehensive and contextually-aware answers. Unlike standard RAG that retrieves isolated text chunks based on semantic similarity, Graph-RAG first uses an LLM to extract entities and their relationships from documents, building a structured knowledge graph. When a query comes in, the system traverses this graph to find interconnected data, providing the LLM with a richer, more structured context to generate a response. This method has shown significant performance improvements in benchmarks, with some studies reporting a 3x increase in accuracy on complex, multi-hop questions. From a cost perspective, implementing the open-source Graph-RAG framework involves a trade-off. The initial construction of the knowledge graph can be computationally expensive, with some estimates suggesting that about 75% of the total indexing cost comes from the LLM calls needed for entity and relationship extraction. However, once the graph is built, per-query costs can be lower and more efficient than traditional RAG, which may require processing large volumes of text for each query. The total cost of ownership for a "build" approach using the open-source framework will also need to factor in engineering effort for implementation, hosting of the graph database, and ongoing maintenance. The launch of an open-source Graph-RAG framework introduces a new dynamic into the enterprise search market. Competitors like Glean also leverage a knowledge graph, but with an added "personal graph" that tailors results based on an individual's role and interactions within the company. Glean's pricing is reported to start at over $50 per user per month, with the total cost of ownership potentially being two to six times the initial license fee when factoring in AI add-ons and infrastructure overhead. Other players in the enterprise search space have different architectural approaches. Hebbia, which targets the finance and legal sectors, utilizes a multi-agent AI system. This approach breaks down complex user questions into smaller sub-tasks that are then assigned to different agents for resolution. Their pricing is not public but is reportedly high, involving a lengthy sales process and long-term contracts. Cohere, another major competitor, focuses on providing a powerful "Rerank" model as part of its offering. This model takes the initial search results from a retrieval system, which could be vector-based or otherwise, and re-orders them to improve relevance before feeding them to a generative model. Cohere offers a more transparent, token-based pricing model for its various models, allowing for more predictable costs based on usage. The decision for an enterprise to build on an open-source framework like Graph-RAG versus buying a solution from a vendor like Glean, Hebbia, or Cohere will depend on several factors. A "build" approach offers greater customizability and control, but requires a significant upfront investment in engineering resources and infrastructure. A "buy" approach provides a managed solution with support, but comes with licensing costs and potential vendor lock-in. The availability of a robust open-source option now gives engineering teams a credible path to developing sophisticated, in-house enterprise search capabilities.

Key numbers

  • This method has shown significant performance improvements in benchmarks, with some studies reporting a 3x increase in accuracy on complex, multi-hop questions.
  • The initial construction of the knowledge graph can be computationally expensive, with some estimates suggesting that about 75% of the total indexing cost comes from the LLM calls needed for entity and relationship extraction.
  • Glean's pricing is reported to start at over $50 per user per month, with the total cost of ownership potentially being two to six times the initial license fee when factoring in AI add-ons and infrastructure overhead.

What happens next

  • However, once the graph is built, per-query costs can be lower and more efficient than traditional RAG, which may require processing large volumes of text for each query.
  • The total cost of ownership for a "build" approach using the open-source framework will also need to factor in engineering effort for implementation, hosting of the graph database, and ongoing maintenance.
  • The launch of an open-source Graph-RAG framework introduces a new dynamic into the enterprise search market.

Quick answers

What happened in Graph-RAG Framework Launches for Complex Queries?

A new open-source Graph-RAG framework has been launched on GitHub to enhance multi-hop reasoning in retrieval systems. The approach uses graph structures to connect disparate pieces of information, enabling more complex and context-aware query answering than traditional vector search.

Why does Graph-RAG Framework Launches for Complex Queries matter?

Graph-RAG: Under the Hood of the New Open-Source Framework The new Graph-RAG framework, introduced by a team at Microsoft Research led by Jonathan Larson, fundamentally shifts how retrieval systems handle complex queries by creating a knowledge graph from unstructured text. This graph-based data structure allows for multi-hop reasoning, connecting disparate pieces of information in a way that traditional vector search struggles with, leading to more comprehensive and contextually-aware answers. Unlike standard RAG that retrieves isolated text chunks based on semantic similarity, Graph-RAG first uses an LLM to extract entities and their relationships from documents, building a structured knowledge graph. When a query comes in, the system traverses this graph to find interconnected data, providing the LLM with a richer, more structured context to generate a response. This method has shown significant performance improvements in benchmarks, with some studies reporting a 3x increase in accuracy on complex, multi-hop questions. From a cost perspective, implementing the open-source Graph-RAG framework involves a trade-off. The initial construction of the knowledge graph can be computationally expensive, with some estimates suggesting that about 75% of the total indexing cost comes from the LLM calls needed for entity and relationship extraction. However, once the graph is built, per-query costs can be lower and more efficient than traditional RAG, which may require processing large volumes of text for each query. The total cost of ownership for a "build" approach using the open-source framework will also need to factor in engineering effort for implementation, hosting of the graph database, and ongoing maintenance. The launch of an open-source Graph-RAG framework introduces a new dynamic into the enterprise search market. Competitors like Glean also leverage a knowledge graph, but with an added "personal graph" that tailors results based on an individual's role and interactions within the company. Glean's pricing is reported to start at over $50 per user per month, with the total cost of ownership potentially being two to six times the initial license fee when factoring in AI add-ons and infrastructure overhead. Other players in the enterprise search space have different architectural approaches. Hebbia, which targets the finance and legal sectors, utilizes a multi-agent AI system. This approach breaks down complex user questions into smaller sub-tasks that are then assigned to different agents for resolution. Their pricing is not public but is reportedly high, involving a lengthy sales process and long-term contracts. Cohere, another major competitor, focuses on providing a powerful "Rerank" model as part of its offering. This model takes the initial search results from a retrieval system, which could be vector-based or otherwise, and re-orders them to improve relevance before feeding them to a generative model. Cohere offers a more transparent, token-based pricing model for its various models, allowing for more predictable costs based on usage. The decision for an enterprise to build on an open-source framework like Graph-RAG versus buying a solution from a vendor like Glean, Hebbia, or Cohere will depend on several factors. A "build" approach offers greater customizability and control, but requires a significant upfront investment in engineering resources and infrastructure. A "buy" approach provides a managed solution with support, but comes with licensing costs and potential vendor lock-in. The availability of a robust open-source option now gives engineering teams a credible path to developing sophisticated, in-house enterprise search capabilities.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Published by The Daily Scout - Be the smartest in the room.