Introduction
In the rapidly evolving landscape of artificial intelligence and natural language processing, Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm that combines the strengths of retrieval-based and generative models. By leveraging vast repositories of information, RAG systems can generate highly relevant and contextually rich responses. However, as the complexity and volume of data continue to grow, the need for more sophisticated techniques becomes apparent. This is where knowledge graphs come into play.
Knowledge graphs provide a structured and interconnected representation of information, capturing relationships between entities in a way that mirrors human understanding. By integrating knowledge graphs into RAG systems, we can enhance their ability to reason, infer, and generate more accurate and insightful content. This synergy not only improves the quality of generated responses but also opens up new avenues for applications across various domains.
In this blog post, we will explore Hybrid GraphRAG, an innovative approach that combines the strengths of knowledge graphs with traditional vector-based retrieval methods to enhance Retrieval-Augmented Generation (RAG) systems. This hybrid architecture leverages structured information alongside retrieved text to provide more accurate and contextually rich responses.
Hybrid GraphRAG integrates two powerful techniques:
- VectorRAG: The traditional approach that uses vector databases for similarity-based text retrieval.
- GraphRAG: A method that leverages knowledge graphs to capture complex relationships between entities.
By combining these approaches, HybridRAG addresses key challenges faced by individual RAG systems:
- Answering questions that require understanding complex relationships between different pieces of information
- Providing responses that necessitate a global context, drawing from the entire dataset
Let’s walk through a step-by-step implementation using the following technology stack:
- Neo4j Aura: Utilizes Neo4j for structured data retrieval, enabling the creation of a comprehensive knowledge graph.
- LangChain Integration: Facilitates seamless interaction between components, including traditional naive RAG methods, to enhance retrieval strategies.
- Ollama: Integrates Ollama for on-device language model inference, ensuring privacy and reducing latency during response generation.
- Gradio: Provides a user-friendly web interface for model interaction, making it accessible for users to engage with the system effortlessly.
Check the following for the full implementation:
https://github.com/ShahedSabab/Hybrid-GraphRAG
Hybrid GraphRAG Architecture
The Hybrid GraphRAG architecture combines the strengths of traditional vector-based retrieval with the structured capabilities of knowledge graphs, enhancing the Retrieval-Augmented Generation (RAG) process. This architecture follows a two-step approach: indexing and retrieval.
During the indexing phase, documents are split into smaller passages to facilitate efficient retrieval. A retriever model then creates embeddings for these passages, which are stored in a vectorstore such as Chroma, Neo4j, FAISS, or Pinecone. In addition to this, a specialized large language model (LLM) is employed to convert unstructured text into a knowledge graph by identifying nodes, entities, and their relationships. This graph is then stored in a graph database like Neo4j.
In the retrieval phase, when a user submits a query, it is processed through two retrieval paths. The vector retriever model searches for relevant information within the vectorstore based on textual similarity. Also, the query is also processed by a graph retriever that searches for relevant structured knowledge within the knowledge graph. The results from both retrieval processes are combined and provided as context to an LLM, which generates a final response that is both contextually accurate and enriched with structured insights.
Implementation steps summary:
1. Build the Graph Retriever Model
- Knowledge graph creation: Use a specialized LLM to extract entities and relationships from text
- Graph database setup: Store the knowledge graph in a graph database (e.g., Neo4j AuraDB)
- Implement graph querying functionality
2. Build the Vector Retriever Model
- Embedding generation: Create embeddings for document chunk using an embedding model
- Vectorstore setup: Store embeddings in a vector database (e.g., Chroma, Neo4j, FAISS, Pinecone)
- Implement similarity search functionality
3. Combine them into a Hybrid Model
- Merge results from both vector and graph retrievers
- Create a context aggregation mechanism
- Integrate with a language model for final response generation
4. UI Setup
- Design and implement a user interface (e.g., using Gradio)
- Create input fields for user queries
- Display results, including relevant passages and graph information
1. Build the Graph Retriever Model:
For building a Graph RAG system, the first step is signing up with Neo4j, a platform that offers comprehensive features for interacting with and storing knowledge graphs. Neo4j’s AuraDB, a fully managed graph database service, provides an ideal environment for developing graph-powered applications due to its scalability, performance, and ease of use. To start free with AuraDB use the following link:
https://neo4j.com/product/auradb/
After signing up you will be forwarded to setting up an instance:
1. Select “New Instance” to begin the creation process.
2. Choose the type of instance you want to create:
- For a free instance, select “Create Free instance”
- For other plans, select the appropriate option (Professional, Business Critical, or Virtual Dedicated Cloud)
- Click “Create” to initiate the instance creation.
- Copy and securely store the provided Username, Generated password and Connection URI.
- After completion, we will have this:
Next step is to connect to the AuraDB from Jupyter. For this create a .env file and paste the neo4j credentials.
.env
NEO4J_URI=neo4j+s://fb***.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=******************************
Use the following code to initiate the connection from a jupyter notebook.
import os
from dotenv import load_dotenv
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Neo4jVector
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from neo4j import GraphDatabase
from langchain_experimental.llms.ollama_functions import OllamaFunctions
import gradio as gr
# Load environment variables from .env file
load_dotenv(override=True)
# Access the variables
neo4j_uri = os.getenv('NEO4J_URI')
neo4j_username = os.getenv('NEO4J_USERNAME')
neo4j_password = os.getenv('NEO4J_PASSWORD')
# Neo4j connection
graph = Neo4jGraph()
Next, set up an LLM. Ollama-hosted gemma2:9b model is used for this example but can be substituted by any model and platform of choice.
llm = OllamaFunctions(model="gemma2:9b", temperature=0, format="json")
Next step is to build a text splitter. This step takes in a document from input/dummy_text.txt and divides it in different passages/chunks. There are a bunch of different choices of text splitter available in langchain. [1]
loader = TextLoader(file_path="input/dummy_text.txt")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=24)
documents = text_splitter.split_documents(documents=docs)
Now, we will use an LLMGraphTransformer [2] to convert the documents to knowledge graph (node, entity, and relationship) and push it to Neo4j AuraDB.
# Initialize the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)
# Convert the document to a graph
graph_documents = llm_transformer.convert_to_graph_documents(documents)
# Use the add_graph_documents method to push the data
graph.add_graph_documents(
graph_documents=graph_documents, # Your graph_document goes here
include_source=True, # Set to True if you want to include the source document
baseEntityLabel=True # Set to True to add a base label to all entities
)
print("Graph data has been successfully pushed to Neo4j.")
After completing this step, we can check the knowledge graph from the Neo4j interface.
Now, we will setup a graph retriever. For this setup, the first step is to find out if users query has an entity (i.e., person, organization). Then in the graph_retriever function, the detected entity from user questions, queries the Neo4j graph database for relevant nodes, and explores their neighborhoods to gather contextual information. The results are formatted as structured strings that illustrate the relationships, providing a clear view of each entity’s context within the knowledge graph for effective response generation.
class Entities(BaseModel):
"""Identifying information about entities."""
names: list[str] = Field(
...,
description=(
"All the person, organization, or business entities that "
"appear in the text"
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are extracting organization and person entities from the text.",
),
(
"human",
"Use the given format to extract information from the following "
"input: {question}",
),
]
)
def generate_full_text_query(input: str) -> str:
words = [el for el in remove_lucene_chars(input).split() if el]
if not words:
return ""
full_text_query = " AND ".join([f"{word}~2" for word in words])
print(f"Generated Query: {full_text_query}")
return full_text_query.strip()
# Full-text index query
def graph_retriever(question: str) -> str:
"""
Collects the neighborhood of entities mentioned in the question
"""
result = ""
# Detect entities through the entity chain and pass them to the graph query
entities = entity_chain.invoke(question)
for entity in entities.names:
response = graph.query(
"""
CALL db.index.fulltext.queryNodes('fulltext_entity_id', $query, {limit:2})
YIELD node, score
CALL {
WITH node
MATCH (node)-[r:!MENTIONS]->(neighbor)
RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
UNION ALL
WITH node
MATCH (node)<-[r:!MENTIONS]-(neighbor)
RETURN neighbor.id + ' - ' + type(r) + ' -> ' + node.id AS output
}
RETURN output LIMIT 50
""",
{"query": entity},
)
result += "\n".join([el['output'] for el in response])
return result
print(graph_retriever("Who is Hinton?"))
# Hinton - STUNNED_BY -> Large Language Models
# Hinton - AT -> Kitchen Table
# Hinton - BELIEVES -> Risks
# Geoffrey Hinton - WORKS_FOR -> Google
# Geoffrey Hinton - SHARED_AWARD -> Yann Lecun
# Geoffrey Hinton - SHARED_AWARD -> Yoshua Bengio
# Geoffrey Hinton - RECEIVED -> Turing Award
# Geoffrey Hinton - LIVES_IN -> North London
# Geoffrey Hinton - SHARED_NOBLE_PRIZE -> John J. Hopfield
2. Build the Vector Retriever Model
To implement a vector retriever model, the process begins by utilizing the same chunked documents from previous steps. These documents are sent to an embedding model (e.g., nomic-embed-text [3]), which generates numerical representations, or embeddings, for each text segment. These embeddings capture the semantic meaning of the text and are crucial for similarity searches. Once generated, the embeddings are stored in a vector database for efficient retrieval. In this setup, Neo4j is used as the vector database, leveraging its capabilities to handle both text and vector data within a graph structure.
ollama_embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Store vector embeddings in Neo4j
db = Neo4jVector.from_documents(
documents,
ollama_embeddings,
url=neo4j_uri,
username=neo4j_username,
password=neo4j_password
)
vector_index = Neo4jVector.from_existing_graph(
ollama_embeddings,
search_type="hybrid",
node_label="Document",
text_node_properties=["text"],
embedding_node_property="embedding"
)
def vector_retriever(question, top_k=1):
vector_ret = vector_index.as_retriever(k=top_k)
return [el.page_content for el in vector_ret.invoke(question)][:top_k]
print(vector_retriever("who is Hinton?"))
['\ntext: Widely regarded as the “godfather of AI,” Hinton shared the Noble prize with John J. Hopfield \n'
'for foundational discoveries and inventions that enable machine learning with artificial neural networks.']
3. Combine them into a Hybrid Model
The final step in building a hybrid retrieval system is to combine the graph and vector retrievers. This is done using the hybrid_retriever function, which takes a user’s question and retrieves relevant information from both methods. It merges the results into a single string, clearly labeling the graph data and vector data. A prompt template guides the language model (LLM) on how to use this combined context to answer the question. Using LangChain, a processing chain is created that gathers this context, applies the prompt, sends it to the LLM, and formats the output.
The invoke_chain function runs this process for any user query and returns the LLM’s response along with the data used from both retrievers. This integration helps the LLM provide more accurate and detailed answers by utilizing both structured (graph) and unstructured (vector) information.
def hybrid_retriever(question: str):
graph_data = graph_retriever(question)
vector_data = vector_retriever(question)
final_data = f"""
GRAPH DATA:
{graph_data}
VECTOR DATA:
{"#Document ".join(vector_data)}
"""
return {
"final_data": final_data,
"graph_data": graph_data,
"vector_data": vector_data
}
template = """Answer the question based only on the following context:
{context}
Question: {question}
Only generate your response from the context. Do not make anything up.
Add as much information as needed to generate a coherent and informative response
based on the context.
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
chain = (
{
"context": lambda x: hybrid_retriever(x)["final_data"],
"question": RunnablePassthrough(),
}
| prompt
| llm
| StrOutputParser()
)
def invoke_chain(query):
retriever_output = hybrid_retriever(query)
response = chain.invoke(query)
return {
"response": response,
"graph_data": retriever_output["graph_data"],
"vector_data": retriever_output["vector_data"]
}
4. UI Setup
To complete the hybrid chatbot implementation, the final step involves setting up a user-friendly chat interface using Gradio. This interface will allow users to interact with the chatbot easily and view relevant information from both the vector and graph retrievers, as well as the final response. Here’s how to set up the Gradio UI:
def gradio_interface(query):
result = invoke_chain(query)
return (
result["response"],
str(result["graph_data"]),
"\n".join(result["vector_data"])
)
with gr.Blocks() as demo:
gr.Markdown("# GraphRAG Query Interface")
query_input = gr.Textbox(lines=2, placeholder="Enter your query here...")
submit_btn = gr.Button("Submit")
with gr.Column():
response_output = gr.Textbox(label="Model Response")
with gr.Accordion("Graph Data", open=True):
graph_data_output = gr.Textbox(label="Graph Data")
with gr.Accordion("Vector Data", open=True):
vector_data_output = gr.Textbox(label="Vector Data")
submit_btn.click(
fn=gradio_interface,
inputs=query_input,
outputs=[response_output, graph_data_output, vector_data_output]
)
demo.launch()
Check the following for the full implementation:
https://github.com/ShahedSabab/Hybrid-GraphRAG
Continue Your Journey:
Curious about the foundational techniques that set the stage for advanced innovations like Hybrid GraphRAG? Read Part 1: Unlocking RAG’s Potential to explore how RAG transforms AI reliability and accuracy.
Connect with an Expert:
Ready to explore how RAG and Hybrid GraphRAG can be tailored to your organization’s needs? Contact us to discuss your unique challenges and goals with one of our experts.
Reference
[1] “text splitter.” Accessed: Nov. 24, 2024. [Online]. Available: https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/
[2] “LLMGraphTransformer.” Accessed: Nov. 24, 2024. [Online]. Available: https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/
[3] “nomic-embed-text.” Accessed: Nov. 24, 2024. [Online]. Available: https://ollama.com/library/nomic-embed-text