# GraphRAG
GraphRAG extends traditional RAG by building knowledge graphs from documents for enhanced retrieval through entity relationships and community detection.
## Overview
GraphRAG provides:
- **Entity Extraction** - Extract entities and relationships using LLM
- **Graph Storage** - Store entities, edges, and communities in PostgreSQL
- **Community Detection** - Cluster related entities with label propagation
- **Graph Retrieval** - Local, global, and hybrid search modes
## Architecture
```
Documents
|
v
Entity Extraction (LLM)
|
v
Graph Storage (PostgreSQL + pgvector)
|
v
Community Detection (Label Propagation)
|
v
Graph Retrieval (Local/Global/Hybrid)
```
## Entity Extraction
Extract entities and relationships from text:
```elixir
alias Rag.GraphRAG.Extractor
alias Rag.Router
{:ok, router} = Router.new(providers: [:gemini])
text = "Alice works for Acme Corp in New York. Bob reports to Alice."
{:ok, result} = Extractor.extract(text, router: router)
# result.entities: [%{name: "Alice", type: :person, ...}, ...]
# result.relationships: [%{source: "Bob", target: "Alice", type: :reports_to, ...}]
```
### Entity Types
- `:person` - Individuals
- `:organization` - Companies, institutions
- `:location` - Geographic places
- `:event` - Named events
- `:concept` - Abstract ideas
- `:technology` - Technologies/tools
- `:document` - Documents/publications
### Relationship Types
- `:works_for` - Employment
- `:located_in` - Geography
- `:created_by` - Authorship
- `:part_of` - Membership
- `:related_to` - General
- `:uses` - Tool usage
- `:depends_on` - Dependencies
### Batch Extraction
```elixir
{:ok, results} = Extractor.extract_batch(documents,
router: router,
max_concurrency: 4,
timeout: 60_000
)
```
### Entity Resolution
Merge duplicate entities:
```elixir
entities = [
%{name: "New York", type: :location, ...},
%{name: "NYC", type: :location, ...}
]
{:ok, resolved} = Extractor.resolve_entities(entities, router: router)
# Returns: [%{name: "New York", aliases: ["NYC"], ...}]
```
## Graph Storage
### Database Setup
```elixir
defmodule MyApp.Repo.Migrations.CreateGraphTables do
use Ecto.Migration
def up do
# Entities (nodes)
create table(:graph_entities) do
add :type, :string, null: false
add :name, :string, null: false
add :properties, :map, default: %{}
add :embedding, :vector, size: 768
add :source_chunk_ids, {:array, :integer}, default: []
timestamps()
end
create index(:graph_entities, [:type])
create index(:graph_entities, [:name])
execute """
CREATE INDEX graph_entities_embedding_idx
ON graph_entities
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100)
"""
# Edges (relationships)
create table(:graph_edges) do
add :from_id, references(:graph_entities, on_delete: :delete_all)
add :to_id, references(:graph_entities, on_delete: :delete_all)
add :type, :string, null: false
add :weight, :float, default: 1.0
add :properties, :map, default: %{}
timestamps()
end
create index(:graph_edges, [:from_id])
create index(:graph_edges, [:to_id])
create index(:graph_edges, [:type])
# Communities (clusters)
create table(:graph_communities) do
add :level, :integer, default: 0
add :summary, :text
add :entity_ids, {:array, :integer}, default: []
timestamps()
end
create index(:graph_communities, [:level])
end
end
```
### Creating Nodes and Edges
```elixir
alias Rag.GraphStore
alias Rag.GraphStore.Pgvector
store = %Pgvector{repo: MyApp.Repo}
# Create entity
{:ok, alice} = GraphStore.create_node(store, %{
type: :person,
name: "Alice Smith",
properties: %{role: "engineer"},
embedding: [0.1, 0.2, ...],
source_chunk_ids: [1, 2, 3]
})
# Create relationship
{:ok, edge} = GraphStore.create_edge(store, %{
from_id: alice.id,
to_id: acme.id,
type: :works_for,
weight: 0.95
})
```
### Graph Traversal
```elixir
# Find neighbors
{:ok, neighbors} = GraphStore.find_neighbors(store, alice.id,
direction: :both, # :in, :out, or :both
limit: 10,
edge_type: :works_for
)
# BFS traversal
{:ok, nodes} = GraphStore.traverse(store, alice.id,
max_depth: 2,
algorithm: :bfs
)
# DFS traversal
{:ok, nodes} = GraphStore.traverse(store, alice.id,
max_depth: 3,
algorithm: :dfs
)
```
### Vector Search on Entities
```elixir
{:ok, similar} = GraphStore.vector_search(store, query_embedding,
limit: 5,
type: :person # Optional filter
)
```
## Community Detection
Detect clusters of related entities:
```elixir
alias Rag.GraphRAG.CommunityDetector
# Detect communities
{:ok, communities} = CommunityDetector.detect(store, max_iterations: 100)
# Returns: [%{id: 1, level: 0, entity_ids: [1, 2, 3], summary: nil}, ...]
# Generate summaries with LLM
{:ok, summarized} = CommunityDetector.summarize_communities(store, communities,
router: router
)
# Combined: detect and summarize
{:ok, communities} = CommunityDetector.detect_and_summarize(store,
router: router,
max_iterations: 100
)
```
### Hierarchical Communities
Build multi-level community hierarchy:
```elixir
{:ok, hierarchy} = CommunityDetector.build_hierarchy(store,
levels: 3,
max_iterations: 100
)
# Returns: [[level_0_communities], [level_1_communities], [level_2_communities]]
```
## Graph-Based Retrieval
### Creating a Graph Retriever
```elixir
alias Rag.Retriever.Graph
retriever = Graph.new(
graph_store: graph_store,
vector_store: vector_store,
mode: :hybrid,
depth: 2,
local_weight: 0.7,
global_weight: 0.3
)
```
### Search Modes
#### Local Search
Find specific, detailed information via entity expansion:
```elixir
{:ok, results} = Graph.local_search(retriever, query_embedding,
limit: 10,
depth: 2
)
```
**Process:**
1. Vector search on entity embeddings
2. BFS traversal to related entities
3. Collect source chunks from entities
4. Score by graph distance (closer = higher)
**Best for:** "What is Alice's role?", specific entity queries
#### Global Search
Find high-level context via community summaries:
```elixir
{:ok, results} = Graph.global_search(retriever, query_embedding,
limit: 10
)
```
**Process:**
1. Vector search on community summaries
2. Return community summaries as context
**Best for:** "What are the main areas of focus?", overview queries
#### Hybrid Search
Combine local and global with weighted RRF:
```elixir
{:ok, results} = Graph.hybrid_search(retriever, query_embedding,
limit: 10
)
```
**Process:**
1. Run local and global in parallel
2. Apply weighted RRF fusion
3. Return merged results
**Best for:** Complex queries needing multiple perspectives
### Using the Retriever
```elixir
alias Rag.Retriever
# With embedding
{:ok, results} = Retriever.retrieve(retriever, query_embedding, limit: 10)
# With text (requires embedding function)
{:ok, results} = Retriever.retrieve(retriever, "search query",
limit: 10,
embedding_fn: fn text ->
{:ok, [emb], _} = Router.execute(router, :embeddings, [text], [])
emb
end
)
```
## Complete Workflow
```elixir
alias Rag.Router
alias Rag.GraphStore
alias Rag.GraphStore.Pgvector
alias Rag.GraphRAG.{Extractor, CommunityDetector}
alias Rag.Retriever.Graph
# 1. Initialize
{:ok, router} = Router.new(providers: [:gemini])
store = %Pgvector{repo: MyApp.Repo}
# 2. Extract entities from documents
documents = ["doc1 text", "doc2 text", "doc3 text"]
{:ok, results} = Extractor.extract_batch(documents, router: router)
# 3. Resolve duplicates
all_entities = Enum.flat_map(results, & &1.entities)
{:ok, resolved} = Extractor.resolve_entities(all_entities, router: router)
# 4. Generate embeddings
entity_texts = Enum.map(resolved, &"#{&1.name}: #{&1.description}")
{:ok, embeddings, _} = Router.execute(router, :embeddings, entity_texts, [])
# 5. Store entities with embeddings
entity_ids = for {entity, embedding} <- Enum.zip(resolved, embeddings) do
{:ok, node} = GraphStore.create_node(store, %{
type: entity.type,
name: entity.name,
properties: %{description: entity.description},
embedding: embedding
})
{entity.name, node.id}
end |> Map.new()
# 6. Create relationships
all_rels = Enum.flat_map(results, & &1.relationships)
for rel <- all_rels do
from_id = entity_ids[rel.source]
to_id = entity_ids[rel.target]
if from_id && to_id do
GraphStore.create_edge(store, %{
from_id: from_id,
to_id: to_id,
type: rel.type,
weight: rel.weight
})
end
end
# 7. Detect and summarize communities
{:ok, communities} = CommunityDetector.detect_and_summarize(store,
router: router,
max_iterations: 100
)
# 8. Create retriever
retriever = Graph.new(
graph_store: store,
vector_store: vector_store,
mode: :hybrid,
depth: 2
)
# 9. Query
{:ok, [query_emb], _} = Router.execute(router, :embeddings, ["AI projects"], [])
{:ok, results} = Retriever.retrieve(retriever, query_emb, limit: 10)
```
## Choosing Search Mode
| Query Type | Mode | Example |
|------------|------|---------|
| Specific entity | `:local` | "What is Alice's role?" |
| Overview | `:global` | "What are the main themes?" |
| Complex/multi-faceted | `:hybrid` | "How do teams connect to projects?" |
## Performance Tips
1. **Batch extraction** - Use `extract_batch/2` with concurrency
2. **Limit traversal depth** - Default depth of 2 balances breadth/performance
3. **Type filtering** - Filter vector search by entity type when possible
4. **Adjust weights** - Tune local/global weights for your use case
5. **Index properly** - Ensure vector and type indexes exist
## Next Steps
- [Retrievers](retrievers.md) - Other retrieval strategies
- [Pipeline](pipelines.md) - Integrate GraphRAG in workflows
- [Agent Framework](agent_framework.md) - Use with agents