RAG Using Knowledge Graph: Mastering Advanced Techniques – Part 2

Table of Contents

Sign up for our newsletter

We care about the protection of your data. Read our Privacy Policy.

A futuristic illustration showcasing the Hybrid Graph RAG architecture. The design features a glowing knowledge graph with interconnected nodes and edges alongside structured text blocks symbolizing a vector database. An overlay of a large language model (LLM) symbol connects the two elements, emphasizing integration. The background features sleek, circuit-like patterns, representing advanced AI technology.

Introduction

In the rapidly evolving landscape of artificial intelligence and natural language processing, Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm that combines the strengths of retrieval-based and generative models. By leveraging vast repositories of information, RAG systems can generate highly relevant and contextually rich responses. However, as the complexity and volume of data continue to grow, the need for more sophisticated techniques becomes apparent. This is where knowledge graphs come into play. 

Knowledge graphs provide a structured and interconnected representation of information, capturing relationships between entities in a way that mirrors human understanding. By integrating knowledge graphs into RAG systems, we can enhance their ability to reason, infer, and generate more accurate and insightful content. This synergy not only improves the quality of generated responses but also opens up new avenues for applications across various domains. 

In this blog post, we will explore Hybrid GraphRAG, an innovative approach that combines the strengths of knowledge graphs with traditional vector-based retrieval methods to enhance Retrieval-Augmented Generation (RAG) systems. This hybrid architecture leverages structured information alongside retrieved text to provide more accurate and contextually rich responses. 

Hybrid GraphRAG integrates two powerful techniques: 

  1. VectorRAG: The traditional approach that uses vector databases for similarity-based text retrieval. 
  2. GraphRAG: A method that leverages knowledge graphs to capture complex relationships between entities. 

 

By combining these approaches, HybridRAG addresses key challenges faced by individual RAG systems: 

  • Answering questions that require understanding complex relationships between different pieces of information 
  • Providing responses that necessitate a global context, drawing from the entire dataset 

 

Let’s walk through a step-by-step implementation using the following technology stack: 

  • Neo4j Aura: Utilizes Neo4j for structured data retrieval, enabling the creation of a comprehensive knowledge graph. 
  • LangChain Integration: Facilitates seamless interaction between components, including traditional naive RAG methods, to enhance retrieval strategies. 
  • Ollama: Integrates Ollama for on-device language model inference, ensuring privacy and reducing latency during response generation. 
  • Gradio: Provides a user-friendly web interface for model interaction, making it accessible for users to engage with the system effortlessly.

Check the following for the full implementation: 

https://github.com/ShahedSabab/Hybrid-GraphRAG 

 

Hybrid GraphRAG Architecture

The Hybrid GraphRAG architecture combines the strengths of traditional vector-based retrieval with the structured capabilities of knowledge graphs, enhancing the Retrieval-Augmented Generation (RAG) process. This architecture follows a two-step approach: indexing and retrieval. 

During the indexing phase, documents are split into smaller passages to facilitate efficient retrieval. A retriever model then creates embeddings for these passages, which are stored in a vectorstore such as Chroma, Neo4j, FAISS, or Pinecone. In addition to this, a specialized large language model (LLM) is employed to convert unstructured text into a knowledge graph by identifying nodes, entities, and their relationships. This graph is then stored in a graph database like Neo4j. 

In the retrieval phase, when a user submits a query, it is processed through two retrieval paths. The vector retriever model searches for relevant information within the vectorstore based on textual similarity. Also, the query is also processed by a graph retriever that searches for relevant structured knowledge within the knowledge graph. The results from both retrieval processes are combined and provided as context to an LLM, which generates a final response that is both contextually accurate and enriched with structured insights. 
 

Implementation steps summary:  

1. Build the Graph Retriever Model 

  • Knowledge graph creation: Use a specialized LLM to extract entities and relationships from text 
  • Graph database setup: Store the knowledge graph in a graph database (e.g., Neo4j AuraDB)  
  • Implement graph querying functionality

 

2. Build the Vector Retriever Model 

  • Embedding generation: Create embeddings for document chunk using an embedding model 
  • Vectorstore setup: Store embeddings in a vector database (e.g., Chroma, Neo4j, FAISS, Pinecone) 
  • Implement similarity search functionality

 

3. Combine them into a Hybrid Model 

  • Merge results from both vector and graph retrievers 
  • Create a context aggregation mechanism 
  • Integrate with a language model for final response generation

 

4. UI Setup 

  • Design and implement a user interface (e.g., using Gradio) 
  • Create input fields for user queries 
  • Display results, including relevant passages and graph information

 

1. Build the Graph Retriever Model:

For building a Graph RAG system, the first step is signing up with Neo4j, a platform that offers comprehensive features for interacting with and storing knowledge graphs. Neo4j’s AuraDB, a fully managed graph database service, provides an ideal environment for developing graph-powered applications due to its scalability, performance, and ease of use. To start free with AuraDB use the following link:  

https://neo4j.com/product/auradb/ 

 

After signing up you will be forwarded to setting up an instance: 

1. Select “New Instance” to begin the creation process.

2. Choose the type of instance you want to create: 

  • For a free instance, select “Create Free instance” 
  • For other plans, select the appropriate option (Professional, Business Critical, or Virtual Dedicated Cloud) 
  • Click “Create” to initiate the instance creation. 
  • Copy and securely store the provided Username, Generated password and Connection URI.  
  • After completion, we will have this: 

Next step is to connect to the AuraDB from Jupyter. For this create a .env file and paste the neo4j credentials. 

.env

				
					NEO4J_URI=neo4j+s://fb***.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=******************************
				
			

Use the following code to initiate the connection from a jupyter notebook.

				
					import os
from dotenv import load_dotenv
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Neo4jVector
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from neo4j import GraphDatabase
from langchain_experimental.llms.ollama_functions import OllamaFunctions
import gradio as gr

# Load environment variables from .env file
load_dotenv(override=True)

# Access the variables
neo4j_uri = os.getenv('NEO4J_URI')
neo4j_username = os.getenv('NEO4J_USERNAME')
neo4j_password = os.getenv('NEO4J_PASSWORD')

# Neo4j connection
graph = Neo4jGraph()
				
			

Next, set up an LLM. Ollama-hosted gemma2:9b model is used for this example but can be substituted by any model and platform of choice.

				
					llm = OllamaFunctions(model="gemma2:9b", temperature=0, format="json")
				
			

Next step is to build a text splitter. This step takes in a document from input/dummy_text.txt and divides it in different passages/chunks. There are a bunch of different choices of text splitter available in langchain. [1]

				
					loader = TextLoader(file_path="input/dummy_text.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=24)
documents = text_splitter.split_documents(documents=docs)
				
			

Now, we will use an LLMGraphTransformer [2] to convert the documents to knowledge graph (node, entity, and relationship) and push it to Neo4j AuraDB.

				
					# Initialize the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(llm=llm)

# Convert the document to a graph
graph_documents = llm_transformer.convert_to_graph_documents(documents)

# Use the add_graph_documents method to push the data
graph.add_graph_documents(
    graph_documents=graph_documents,  # Your graph_document goes here
    include_source=True,  # Set to True if you want to include the source document
    baseEntityLabel=True  # Set to True to add a base label to all entities
)

print("Graph data has been successfully pushed to Neo4j.")
				
			

After completing this step, we can check the knowledge graph from the Neo4j interface.

Now, we will setup a graph retriever. For this setup, the first step is to find out if users query has an entity (i.e., person, organization). Then in the graph_retriever function, the detected entity from user questions, queries the Neo4j graph database for relevant nodes, and explores their neighborhoods to gather contextual information. The results are formatted as structured strings that illustrate the relationships, providing a clear view of each entity’s context within the knowledge graph for effective response generation.

				
					class Entities(BaseModel):
    """Identifying information about entities."""
    
    names: list[str] = Field(
        ...,
        description=(
            "All the person, organization, or business entities that "
            "appear in the text"
    )
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are extracting organization and person entities from the text.",
        ),
        (
            "human",
            "Use the given format to extract information from the following "
            "input: {question}",
        ),
    ]
)
				
			
				
					def generate_full_text_query(input: str) -> str:
    words = [el for el in remove_lucene_chars(input).split() if el]
    if not words:
        return ""
    full_text_query = " AND ".join([f"{word}~2" for word in words])
    print(f"Generated Query: {full_text_query}")
    return full_text_query.strip()

# Full-text index query
def graph_retriever(question: str) -> str:
    """
    Collects the neighborhood of entities mentioned in the question
    """
    result = ""
    # Detect entities through the entity chain and pass them to the graph query
    entities = entity_chain.invoke(question)

    for entity in entities.names:
        response = graph.query(
            """
            CALL db.index.fulltext.queryNodes('fulltext_entity_id', $query, {limit:2})
            YIELD node, score
            CALL {
                WITH node
                MATCH (node)-[r:!MENTIONS]->(neighbor)
                RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
                UNION ALL
                WITH node
                MATCH (node)<-[r:!MENTIONS]-(neighbor)
                RETURN neighbor.id + ' - ' + type(r) + ' -> ' + node.id AS output
            }
            RETURN output LIMIT 50
            """,
            {"query": entity},
        )
        result += "\n".join([el['output'] for el in response])
    
    return result
				
			
				
					print(graph_retriever("Who is Hinton?"))

# Hinton - STUNNED_BY -> Large Language Models
# Hinton - AT -> Kitchen Table
# Hinton - BELIEVES -> Risks
# Geoffrey Hinton - WORKS_FOR -> Google
# Geoffrey Hinton - SHARED_AWARD -> Yann Lecun
# Geoffrey Hinton - SHARED_AWARD -> Yoshua Bengio
# Geoffrey Hinton - RECEIVED -> Turing Award
# Geoffrey Hinton - LIVES_IN -> North London
# Geoffrey Hinton - SHARED_NOBLE_PRIZE -> John J. Hopfield
				
			

 

2. Build the Vector Retriever Model

To implement a vector retriever model, the process begins by utilizing the same chunked documents from previous steps. These documents are sent to an embedding model (e.g., nomic-embed-text [3]), which generates numerical representations, or embeddings, for each text segment. These embeddings capture the semantic meaning of the text and are crucial for similarity searches. Once generated, the embeddings are stored in a vector database for efficient retrieval. In this setup, Neo4j is used as the vector database, leveraging its capabilities to handle both text and vector data within a graph structure.

				
					ollama_embeddings = OllamaEmbeddings(model="nomic-embed-text")

# Store vector embeddings in Neo4j
db = Neo4jVector.from_documents(
    documents,
    ollama_embeddings,
    url=neo4j_uri,
    username=neo4j_username,
    password=neo4j_password
)
vector_index = Neo4jVector.from_existing_graph(
    ollama_embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)
def vector_retriever(question, top_k=1):
    vector_ret = vector_index.as_retriever(k=top_k)
    return [el.page_content for el in vector_ret.invoke(question)][:top_k]
				
			
				
					print(vector_retriever("who is Hinton?"))

['\ntext: Widely regarded as the “godfather of AI,” Hinton shared the Noble prize with John J. Hopfield \n'
    'for foundational discoveries and inventions that enable machine learning with artificial neural networks.']
				
			

 

3. Combine them into a Hybrid Model

The final step in building a hybrid retrieval system is to combine the graph and vector retrievers. This is done using the hybrid_retriever function, which takes a user’s question and retrieves relevant information from both methods. It merges the results into a single string, clearly labeling the graph data and vector data. A prompt template guides the language model (LLM) on how to use this combined context to answer the question. Using LangChain, a processing chain is created that gathers this context, applies the prompt, sends it to the LLM, and formats the output.

The invoke_chain function runs this process for any user query and returns the LLM’s response along with the data used from both retrievers. This integration helps the LLM provide more accurate and detailed answers by utilizing both structured (graph) and unstructured (vector) information. 

				
					def hybrid_retriever(question: str):
    graph_data = graph_retriever(question)
    vector_data = vector_retriever(question)
    final_data = f"""
    GRAPH DATA:
    {graph_data}
    VECTOR DATA:
    {"#Document ".join(vector_data)}
    """
    return {
        "final_data": final_data,
        "graph_data": graph_data,
        "vector_data": vector_data
    }
    
template = """Answer the question based only on the following context:
{context}
Question: {question}
Only generate your response from the context. Do not make anything up.
Add as much information as needed to generate a coherent and informative response 
based on the context.
Answer:"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
    {
        "context": lambda x: hybrid_retriever(x)["final_data"],
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

def invoke_chain(query):
    retriever_output = hybrid_retriever(query)
    response = chain.invoke(query)
    return {
        "response": response,
        "graph_data": retriever_output["graph_data"],
        "vector_data": retriever_output["vector_data"]
    }
				
			

 

4. UI Setup

To complete the hybrid chatbot implementation, the final step involves setting up a user-friendly chat interface using Gradio. This interface will allow users to interact with the chatbot easily and view relevant information from both the vector and graph retrievers, as well as the final response. Here’s how to set up the Gradio UI:

				
					def gradio_interface(query):
    result = invoke_chain(query)
    return (
        result["response"],
        str(result["graph_data"]),
        "\n".join(result["vector_data"])
    )

with gr.Blocks() as demo:
    gr.Markdown("# GraphRAG Query Interface")
    
    query_input = gr.Textbox(lines=2, placeholder="Enter your query here...")
    submit_btn = gr.Button("Submit")
    
    with gr.Column():
        response_output = gr.Textbox(label="Model Response")
        
        with gr.Accordion("Graph Data", open=True):
            graph_data_output = gr.Textbox(label="Graph Data")
        
        with gr.Accordion("Vector Data", open=True):
            vector_data_output = gr.Textbox(label="Vector Data")
    
    submit_btn.click(
        fn=gradio_interface,
        inputs=query_input,
        outputs=[response_output, graph_data_output, vector_data_output]
    )

demo.launch()
				
			

Check the following for the full implementation: 

https://github.com/ShahedSabab/Hybrid-GraphRAG 

 
Continue Your Journey:

Curious about the foundational techniques that set the stage for advanced innovations like Hybrid GraphRAG? Read Part 1: Unlocking RAG’s Potential to explore how RAG transforms AI reliability and accuracy.

Connect with an Expert:

Ready to explore how RAG and Hybrid GraphRAG can be tailored to your organization’s needs? Contact us to discuss your unique challenges and goals with one of our experts.

 

Reference

[1] “text splitter.” Accessed: Nov. 24, 2024. [Online]. Available: https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/ 

[2] “LLMGraphTransformer.” Accessed: Nov. 24, 2024. [Online]. Available: https://python.langchain.com/v0.1/docs/use_cases/graph/constructing/ 

[3] “nomic-embed-text.” Accessed: Nov. 24, 2024. [Online]. Available: https://ollama.com/library/nomic-embed-text 

Subscribe to our newsletter

Stay informed with the latest insights, industry trends, and expert tips delivered straight to your inbox. Sign up for our newsletter today and never miss an update!

We care about the protection of your data. Read our Privacy Policy.

Keep reading

Dig deeper into data development by browsing our blogs…
A diverse team of professionals at ProCogia collaborates in a modern office, analyzing complex data visualizations on a large digital screen. One person actively points at the screen while others engage in discussion, symbolizing end-to-end problem-solving, strategic planning, and teamwork. The high-tech setting reflects deep engagement in solving real-world challenges.

Delivering End-to-End Data Solutions That Drive Outcomes

In today’s rapidly evolving data landscape, businesses need more than just tools—they need comprehensive, end-to-end solutions that drive real impact. Too often, companies invest in data products without the right strategy, integration, or expertise to maximize their value. At ProCogia, we take a different approach: we embed ourselves in our clients’ ecosystems, ensuring that data engineering, pipelines, analytics, and AI solutions aren’t just implemented, but truly optimized for long-term success.

This blog explores why trust, deep collaboration, and tailored consulting are essential in transforming data into meaningful insights. Whether it’s breaking down silos in healthcare, refining AI-powered search engines, or enabling financial institutions to make smarter decisions, ProCogia’s approach ensures that technology aligns with business needs—not the other way around.

Get in Touch

Let us leverage your data so that you can make smarter decisions. Talk to our team of data experts today or fill in this form and we’ll be in touch.