guides/graph_rag.md

Select File:
# GraphRAG

GraphRAG extends traditional RAG by building knowledge graphs from documents for enhanced retrieval through entity relationships and community detection.

## Overview

GraphRAG provides:
- **Entity Extraction** - Extract entities and relationships using LLM
- **Graph Storage** - Store entities, edges, and communities in PostgreSQL
- **Community Detection** - Cluster related entities with label propagation
- **Graph Retrieval** - Local, global, and hybrid search modes

## Architecture

```
Documents
    |
    v
Entity Extraction (LLM)
    |
    v
Graph Storage (PostgreSQL + pgvector)
    |
    v
Community Detection (Label Propagation)
    |
    v
Graph Retrieval (Local/Global/Hybrid)
```

## Entity Extraction

Extract entities and relationships from text:

```elixir
alias Rag.GraphRAG.Extractor
alias Rag.Router

{:ok, router} = Router.new(providers: [:gemini])

text = "Alice works for Acme Corp in New York. Bob reports to Alice."

{:ok, result} = Extractor.extract(text, router: router)
# result.entities: [%{name: "Alice", type: :person, ...}, ...]
# result.relationships: [%{source: "Bob", target: "Alice", type: :reports_to, ...}]
```

### Entity Types

- `:person` - Individuals
- `:organization` - Companies, institutions
- `:location` - Geographic places
- `:event` - Named events
- `:concept` - Abstract ideas
- `:technology` - Technologies/tools
- `:document` - Documents/publications

### Relationship Types

- `:works_for` - Employment
- `:located_in` - Geography
- `:created_by` - Authorship
- `:part_of` - Membership
- `:related_to` - General
- `:uses` - Tool usage
- `:depends_on` - Dependencies

### Batch Extraction

```elixir
{:ok, results} = Extractor.extract_batch(documents,
  router: router,
  max_concurrency: 4,
  timeout: 60_000
)
```

### Entity Resolution

Merge duplicate entities:

```elixir
entities = [
  %{name: "New York", type: :location, ...},
  %{name: "NYC", type: :location, ...}
]

{:ok, resolved} = Extractor.resolve_entities(entities, router: router)
# Returns: [%{name: "New York", aliases: ["NYC"], ...}]
```

## Graph Storage

### Database Setup

```elixir
defmodule MyApp.Repo.Migrations.CreateGraphTables do
  use Ecto.Migration

  def up do
    # Entities (nodes)
    create table(:graph_entities) do
      add :type, :string, null: false
      add :name, :string, null: false
      add :properties, :map, default: %{}
      add :embedding, :vector, size: 768
      add :source_chunk_ids, {:array, :integer}, default: []
      timestamps()
    end

    create index(:graph_entities, [:type])
    create index(:graph_entities, [:name])

    execute """
    CREATE INDEX graph_entities_embedding_idx
    ON graph_entities
    USING ivfflat (embedding vector_l2_ops)
    WITH (lists = 100)
    """

    # Edges (relationships)
    create table(:graph_edges) do
      add :from_id, references(:graph_entities, on_delete: :delete_all)
      add :to_id, references(:graph_entities, on_delete: :delete_all)
      add :type, :string, null: false
      add :weight, :float, default: 1.0
      add :properties, :map, default: %{}
      timestamps()
    end

    create index(:graph_edges, [:from_id])
    create index(:graph_edges, [:to_id])
    create index(:graph_edges, [:type])

    # Communities (clusters)
    create table(:graph_communities) do
      add :level, :integer, default: 0
      add :summary, :text
      add :entity_ids, {:array, :integer}, default: []
      timestamps()
    end

    create index(:graph_communities, [:level])
  end
end
```

### Creating Nodes and Edges

```elixir
alias Rag.GraphStore
alias Rag.GraphStore.Pgvector

store = %Pgvector{repo: MyApp.Repo}

# Create entity
{:ok, alice} = GraphStore.create_node(store, %{
  type: :person,
  name: "Alice Smith",
  properties: %{role: "engineer"},
  embedding: [0.1, 0.2, ...],
  source_chunk_ids: [1, 2, 3]
})

# Create relationship
{:ok, edge} = GraphStore.create_edge(store, %{
  from_id: alice.id,
  to_id: acme.id,
  type: :works_for,
  weight: 0.95
})
```

### Graph Traversal

```elixir
# Find neighbors
{:ok, neighbors} = GraphStore.find_neighbors(store, alice.id,
  direction: :both,  # :in, :out, or :both
  limit: 10,
  edge_type: :works_for
)

# BFS traversal
{:ok, nodes} = GraphStore.traverse(store, alice.id,
  max_depth: 2,
  algorithm: :bfs
)

# DFS traversal
{:ok, nodes} = GraphStore.traverse(store, alice.id,
  max_depth: 3,
  algorithm: :dfs
)
```

### Vector Search on Entities

```elixir
{:ok, similar} = GraphStore.vector_search(store, query_embedding,
  limit: 5,
  type: :person  # Optional filter
)
```

## Community Detection

Detect clusters of related entities:

```elixir
alias Rag.GraphRAG.CommunityDetector

# Detect communities
{:ok, communities} = CommunityDetector.detect(store, max_iterations: 100)
# Returns: [%{id: 1, level: 0, entity_ids: [1, 2, 3], summary: nil}, ...]

# Generate summaries with LLM
{:ok, summarized} = CommunityDetector.summarize_communities(store, communities,
  router: router
)

# Combined: detect and summarize
{:ok, communities} = CommunityDetector.detect_and_summarize(store,
  router: router,
  max_iterations: 100
)
```

### Hierarchical Communities

Build multi-level community hierarchy:

```elixir
{:ok, hierarchy} = CommunityDetector.build_hierarchy(store,
  levels: 3,
  max_iterations: 100
)
# Returns: [[level_0_communities], [level_1_communities], [level_2_communities]]
```

## Graph-Based Retrieval

### Creating a Graph Retriever

```elixir
alias Rag.Retriever.Graph

retriever = Graph.new(
  graph_store: graph_store,
  vector_store: vector_store,
  mode: :hybrid,
  depth: 2,
  local_weight: 0.7,
  global_weight: 0.3
)
```

### Search Modes

#### Local Search

Find specific, detailed information via entity expansion:

```elixir
{:ok, results} = Graph.local_search(retriever, query_embedding,
  limit: 10,
  depth: 2
)
```

**Process:**
1. Vector search on entity embeddings
2. BFS traversal to related entities
3. Collect source chunks from entities
4. Score by graph distance (closer = higher)

**Best for:** "What is Alice's role?", specific entity queries

#### Global Search

Find high-level context via community summaries:

```elixir
{:ok, results} = Graph.global_search(retriever, query_embedding,
  limit: 10
)
```

**Process:**
1. Vector search on community summaries
2. Return community summaries as context

**Best for:** "What are the main areas of focus?", overview queries

#### Hybrid Search

Combine local and global with weighted RRF:

```elixir
{:ok, results} = Graph.hybrid_search(retriever, query_embedding,
  limit: 10
)
```

**Process:**
1. Run local and global in parallel
2. Apply weighted RRF fusion
3. Return merged results

**Best for:** Complex queries needing multiple perspectives

### Using the Retriever

```elixir
alias Rag.Retriever

# With embedding
{:ok, results} = Retriever.retrieve(retriever, query_embedding, limit: 10)

# With text (requires embedding function)
{:ok, results} = Retriever.retrieve(retriever, "search query",
  limit: 10,
  embedding_fn: fn text ->
    {:ok, [emb], _} = Router.execute(router, :embeddings, [text], [])
    emb
  end
)
```

## Complete Workflow

```elixir
alias Rag.Router
alias Rag.GraphStore
alias Rag.GraphStore.Pgvector
alias Rag.GraphRAG.{Extractor, CommunityDetector}
alias Rag.Retriever.Graph

# 1. Initialize
{:ok, router} = Router.new(providers: [:gemini])
store = %Pgvector{repo: MyApp.Repo}

# 2. Extract entities from documents
documents = ["doc1 text", "doc2 text", "doc3 text"]
{:ok, results} = Extractor.extract_batch(documents, router: router)

# 3. Resolve duplicates
all_entities = Enum.flat_map(results, & &1.entities)
{:ok, resolved} = Extractor.resolve_entities(all_entities, router: router)

# 4. Generate embeddings
entity_texts = Enum.map(resolved, &"#{&1.name}: #{&1.description}")
{:ok, embeddings, _} = Router.execute(router, :embeddings, entity_texts, [])

# 5. Store entities with embeddings
entity_ids = for {entity, embedding} <- Enum.zip(resolved, embeddings) do
  {:ok, node} = GraphStore.create_node(store, %{
    type: entity.type,
    name: entity.name,
    properties: %{description: entity.description},
    embedding: embedding
  })
  {entity.name, node.id}
end |> Map.new()

# 6. Create relationships
all_rels = Enum.flat_map(results, & &1.relationships)
for rel <- all_rels do
  from_id = entity_ids[rel.source]
  to_id = entity_ids[rel.target]

  if from_id && to_id do
    GraphStore.create_edge(store, %{
      from_id: from_id,
      to_id: to_id,
      type: rel.type,
      weight: rel.weight
    })
  end
end

# 7. Detect and summarize communities
{:ok, communities} = CommunityDetector.detect_and_summarize(store,
  router: router,
  max_iterations: 100
)

# 8. Create retriever
retriever = Graph.new(
  graph_store: store,
  vector_store: vector_store,
  mode: :hybrid,
  depth: 2
)

# 9. Query
{:ok, [query_emb], _} = Router.execute(router, :embeddings, ["AI projects"], [])
{:ok, results} = Retriever.retrieve(retriever, query_emb, limit: 10)
```

## Choosing Search Mode

| Query Type | Mode | Example |
|------------|------|---------|
| Specific entity | `:local` | "What is Alice's role?" |
| Overview | `:global` | "What are the main themes?" |
| Complex/multi-faceted | `:hybrid` | "How do teams connect to projects?" |

## Performance Tips

1. **Batch extraction** - Use `extract_batch/2` with concurrency
2. **Limit traversal depth** - Default depth of 2 balances breadth/performance
3. **Type filtering** - Filter vector search by entity type when possible
4. **Adjust weights** - Tune local/global weights for your use case
5. **Index properly** - Ensure vector and type indexes exist

## Next Steps

- [Retrievers](retrievers.md) - Other retrieval strategies
- [Pipeline](pipelines.md) - Integrate GraphRAG in workflows
- [Agent Framework](agent_framework.md) - Use with agents