# Ragex Graph Algorithms
This document covers the graph algorithms available in Ragex for analyzing code structure, relationships, and importance metrics.
## Table of Contents
- [Overview](#overview)
- [PageRank](#pagerank)
- [Path Finding](#path-finding)
- [Centrality Metrics](#centrality-metrics)
- [Graph Statistics](#graph-statistics)
- [Usage Examples](#usage-examples)
- [Performance Characteristics](#performance-characteristics)
## Overview
Ragex implements several graph algorithms for analyzing code relationships:
| Algorithm | Purpose | Use Cases |
|-----------|---------|-----------|
| **PageRank** | Importance scoring | Finding central/critical functions |
| **Path Finding** | Dependency chains | Understanding call flows, impact analysis |
| **Degree Centrality** | Connection metrics | Identifying highly-coupled code |
| **Graph Stats** | Overall analysis | Codebase health, complexity assessment |
All algorithms operate on the call graph built from code analysis.
## PageRank
### What It Does
PageRank measures the importance of functions and modules based on who calls them. Functions called by many other functions score higher.
### Algorithm
Based on Google's PageRank algorithm:
- Iterative computation with damping factor
- Higher scores = more important nodes
- Considers both incoming edges and importance of callers
### Usage
```elixir
alias Ragex.Graph.Algorithms
# Basic usage (with defaults)
scores = Algorithms.pagerank()
# Returns: %{
# {:function, :ModuleA, :foo, 0} => 0.234,
# {:function, :ModuleB, :bar, 1} => 0.189,
# ...
# }
```
### Options
```elixir
Algorithms.pagerank(
damping_factor: 0.85, # Probability of following edges (default: 0.85)
max_iterations: 100, # Maximum iterations (default: 100)
tolerance: 0.0001 # Convergence threshold (default: 0.0001)
)
```
### Parameters Explained
**damping_factor (0.85):**
- Probability a random walker follows an edge vs. teleporting
- Higher = more weight on graph structure
- Lower = more equal distribution
- Range: 0.0 to 1.0
- Typical values: 0.85 (Google), 0.5 (more distributed)
**max_iterations (100):**
- Maximum number of iterations before stopping
- Prevents infinite loops
- Usually converges in 10-50 iterations
- Increase for very large graphs
**tolerance (0.0001):**
- Convergence threshold
- Stops when maximum score change < tolerance
- Lower = more precise, slower
- Higher = less precise, faster
### Interpretation
**High PageRank (>0.5):**
- Critical functions called by many others
- Central to the codebase
- Changes here have wide impact
- Examples: utility functions, core APIs
**Medium PageRank (0.1-0.5):**
- Moderately important functions
- Called by several others
- Standard application logic
**Low PageRank (<0.1):**
- Leaf functions (few callers)
- Application entry points
- Specialized utilities
### Example Results
```elixir
scores = Algorithms.pagerank()
# Get top 10 most important functions
top = scores
|> Enum.sort_by(fn {_id, score} -> -score end)
|> Enum.take(10)
# Results might look like:
# [
# {{:function, :Utils, :parse_json, 1}, 0.456}, # Called everywhere
# {{:function, :Core, :process, 2}, 0.389}, # Central processor
# {{:function, :DB, :query, 1}, 0.234}, # Database access
# ...
# ]
```
### Performance
- **Time Complexity**: O(iterations × edges)
- **Space Complexity**: O(nodes)
- **Typical Runtime**: <100ms for 1,000 nodes
## Path Finding
### What It Does
Finds all paths between two nodes in the call graph. Useful for understanding how one function reaches another through calls.
### Algorithm
Depth-first search (DFS) with:
- Cycle detection (via visited set)
- Depth limiting
- **Path count limiting** (Phase 4D)
- **Early stopping** when limits reached
- **Dense graph warnings**
### Basic Usage
```elixir
alias Ragex.Graph.Algorithms
# Find paths from function A to function B
paths = Algorithms.find_paths(
{:function, :ModuleA, :foo, 0},
{:function, :ModuleC, :baz, 0}
)
# Returns: [
# [{:function, :ModuleA, :foo, 0}, {:function, :ModuleB, :bar, 0}, {:function, :ModuleC, :baz, 0}],
# [{:function, :ModuleA, :foo, 0}, {:function, :ModuleD, :qux, 0}, {:function, :ModuleC, :baz, 0}]
# ]
```
### Options
```elixir
Algorithms.find_paths(from, to,
max_depth: 10, # Maximum path length in edges (default: 10)
max_paths: 100, # Maximum paths to return (default: 100)
warn_dense: true # Emit warnings for dense graphs (default: true)
)
```
### Parameters Explained
**max_depth (10):**
- Maximum path length measured in edges (hops)
- Path with 3 nodes has 2 edges
- Prevents searching too deep
- Typical values:
- 5: Direct dependencies only
- 10: Standard (covers most practical cases)
- 20: Deep analysis (may be slow)
**max_paths (100):**
- Maximum number of paths to return
- Prevents exponential explosion in dense graphs
- Stops DFS early when reached
- Typical values:
- 10: Quick analysis
- 100: Standard (Phase 4D default)
- 1000: Exhaustive (may hang on dense graphs)
**warn_dense (true):**
- Automatically warn about dense graphs
- Checks starting node's out-degree
- Helpful for understanding performance
- Set to `false` in automated systems
### Dense Graph Warnings
The system automatically detects and warns about potentially slow operations:
**INFO Level (≥10 edges):**
```
Moderately connected node: {:function, :ModuleA, :foo, 0} has 12 outgoing edges.
Path finding may take some time.
```
**WARNING Level (≥20 edges):**
```
Dense graph detected: Node {:function, :HubModule, :central, 0} has 25 outgoing edges.
Path finding may be slow or return partial results. Consider reducing max_depth or max_paths.
```
### Examples
#### 1. Finding Direct Dependencies
```elixir
# Find immediate calls (depth 1)
paths = Algorithms.find_paths(from, to, max_depth: 1)
```
#### 2. Quick Analysis (Limited Paths)
```elixir
# Get just a few example paths
paths = Algorithms.find_paths(from, to, max_paths: 10)
```
#### 3. Deep Dependency Analysis
```elixir
# Explore deeper relationships (careful with dense graphs!)
paths = Algorithms.find_paths(from, to, max_depth: 15, max_paths: 200)
```
#### 4. Silent Operation (No Warnings)
```elixir
# For automated tools
paths = Algorithms.find_paths(from, to, warn_dense: false)
```
#### 5. Checking if Path Exists
```elixir
# Quick check
case Algorithms.find_paths(from, to, max_paths: 1) do
[] -> :no_path
[_path] -> :has_path
end
```
### Interpreting Results
**Empty List `[]`:**
- No path exists from source to target
- Functions are independent
- Target not reachable through calls
**Single Path:**
- Direct or unique dependency chain
- Clear relationship
**Multiple Paths:**
- Function is reachable through different routes
- May indicate coupling or complex dependencies
- Consider refactoring if count is very high
**Truncated Results (= max_paths):**
- Many more paths likely exist
- Dense graph structure
- Consider:
- Reducing max_depth
- Refactoring to reduce coupling
- Increasing max_paths if needed
### Performance
| Scenario | Complexity | Typical Time |
|----------|------------|--------------|
| Sparse graph | O(V + E) | <10ms |
| Moderate graph | O(V × D) | 10-100ms |
| Dense graph (no limit) | O(V^D) | Hang risk! |
| Dense graph (with limit) | O(max_paths × D) | <200ms |
**V** = vertices (nodes), **E** = edges, **D** = max_depth
### Known Limitations
1. **Non-deterministic order**: Path order may vary between runs (DFS traversal order)
2. **No truncation indicator**: Can't tell if results are complete or truncated
3. **Path quality**: All paths treated equally (no shortest-path preference)
4. **Memory usage**: Storing many long paths can consume significant memory
## Centrality Metrics
### What It Does
Computes connection-based metrics for all nodes in the graph.
### Metrics
**In-Degree:**
- Number of incoming edges (callers)
- Functions with high in-degree are called by many others
- Indicates reusability or coupling
**Out-Degree:**
- Number of outgoing edges (callees)
- Functions with high out-degree call many others
- Indicates complexity or coordination role
**Total Degree:**
- Sum of in-degree and out-degree
- Overall connectivity metric
### Usage
```elixir
centrality = Algorithms.degree_centrality()
# Returns: %{
# {:function, :ModuleA, :foo, 0} => %{
# in_degree: 0, # Not called by anyone
# out_degree: 5, # Calls 5 other functions
# total_degree: 5
# },
# {:function, :Utils, :helper, 1} => %{
# in_degree: 12, # Called by 12 functions
# out_degree: 2, # Calls 2 functions
# total_degree: 14
# },
# ...
# }
```
### Interpretation
#### High In-Degree (>10)
- **Utility functions**: Reused across codebase
- **Core APIs**: Central to application logic
- **Potential issues**: Single point of failure, change impact
#### High Out-Degree (>10)
- **Coordinator functions**: Orchestrate multiple operations
- **Complex logic**: May need refactoring
- **Dense nodes**: Path finding may be slow
#### High Total Degree (>20)
- **Hub nodes**: Central to the graph
- **Critical code**: Changes affect many paths
- **Refactoring target**: Consider splitting
### Example Analysis
```elixir
centrality = Algorithms.degree_centrality()
# Find functions called by many (utilities)
utilities = centrality
|> Enum.filter(fn {_id, metrics} -> metrics.in_degree > 10 end)
|> Enum.sort_by(fn {_id, metrics} -> -metrics.in_degree end)
# Find complex coordinators
coordinators = centrality
|> Enum.filter(fn {_id, metrics} -> metrics.out_degree > 15 end)
|> Enum.sort_by(fn {_id, metrics} -> -metrics.out_degree end)
```
## Graph Statistics
### What It Does
Provides comprehensive overview of the entire codebase graph structure.
### Usage
```elixir
stats = Algorithms.graph_stats()
# Returns: %{
# node_count: 1234,
# node_counts_by_type: %{
# function: 1000,
# module: 100,
# call: 3000
# },
# edge_count: 3000,
# average_degree: 4.86,
# density: 0.0024,
# top_nodes: [
# {{:function, :Utils, :parse, 1}, 0.234},
# {{:function, :Core, :run, 0}, 0.189},
# ...
# ]
# }
```
### Metrics Explained
**node_count:**
- Total number of nodes (modules, functions, calls)
- Larger = bigger codebase
**node_counts_by_type:**
- Breakdown by entity type
- Useful for understanding code composition
**edge_count:**
- Total number of call relationships
- Indicates coupling level
**average_degree:**
- Average connections per node
- Higher = more coupled
- Typical values:
- 2-4: Well-structured, modular
- 5-10: Moderate coupling
- >10: High coupling, consider refactoring
**density:**
- Ratio of actual to possible edges
- Range: 0.0 (no edges) to 1.0 (fully connected)
- Typical values:
- <0.01: Sparse, well-structured
- 0.01-0.05: Moderate
- >0.05: Dense, potential issues
**top_nodes:**
- Top 10 functions by PageRank
- Most important/central functions
### Example Interpretation
```elixir
stats = Algorithms.graph_stats()
IO.puts("Codebase Analysis:")
IO.puts(" Total functions: #{stats.node_counts_by_type[:function]}")
IO.puts(" Total calls: #{stats.edge_count}")
IO.puts(" Coupling level: #{stats.average_degree}")
# Health check
cond do
stats.average_degree < 4 ->
IO.puts("✓ Well-structured codebase")
stats.average_degree < 8 ->
IO.puts("⚠ Moderate coupling, consider modularization")
true ->
IO.puts("⚠ High coupling, refactoring recommended")
end
```
## Usage Examples
### 1. Find Critical Functions
```elixir
# Combine PageRank and centrality
scores = Algorithms.pagerank()
centrality = Algorithms.degree_centrality()
critical_functions = scores
|> Enum.filter(fn {id, score} ->
metrics = Map.get(centrality, id, %{in_degree: 0})
score > 0.2 and metrics.in_degree > 10
end)
|> Enum.sort_by(fn {_id, score} -> -score end)
IO.puts("Critical functions that need tests:")
for {id, score} <- critical_functions do
IO.puts(" #{inspect(id)} - score: #{Float.round(score, 3)}")
end
```
### 2. Impact Analysis
```elixir
# Find all functions affected by a change
changed_function = {:function, :Core, :process, 2}
# Get reverse dependencies (who calls this?)
centrality = Algorithms.degree_centrality()
{:ok, metrics} = Map.fetch(centrality, changed_function)
IO.puts("Direct impact: #{metrics.in_degree} callers")
# Check transitivity by finding paths to entry points
entry_points = find_entry_points() # Your function
for entry <- entry_points do
paths = Algorithms.find_paths(changed_function, entry, max_paths: 10)
unless Enum.empty?(paths) do
IO.puts(" Affects: #{inspect(entry)} (#{length(paths)} paths)")
end
end
```
### 3. Detect Code Smells
```elixir
centrality = Algorithms.degree_centrality()
# Find "God functions" (too many responsibilities)
god_functions = centrality
|> Enum.filter(fn {_id, m} -> m.out_degree > 20 end)
# Find "hub functions" (too many dependencies)
hub_functions = centrality
|> Enum.filter(fn {_id, m} -> m.in_degree > 30 end)
# Find isolated modules
isolated = centrality
|> Enum.filter(fn {_id, m} -> m.total_degree == 0 end)
IO.puts("Code health report:")
IO.puts(" God functions: #{length(god_functions)}")
IO.puts(" Hub functions: #{length(hub_functions)}")
IO.puts(" Isolated entities: #{length(isolated)}")
```
### 4. Dependency Chain Visualization
```elixir
# Find all paths and visualize
from = {:function, :API, :handle_request, 1}
to = {:function, :DB, :query, 2}
paths = Algorithms.find_paths(from, to, max_depth: 5)
IO.puts("Request flows from API to Database:")
for path <- paths do
IO.puts("\n " <> Enum.map_join(path, " -> ", &format_node/1))
end
defp format_node({:function, module, name, arity}) do
"#{module}.#{name}/#{arity}"
end
```
### 5. Codebase Evolution Tracking
```elixir
# Compare stats over time
stats_before = Algorithms.graph_stats()
# ... make changes ...
stats_after = Algorithms.graph_stats()
IO.puts("Changes:")
IO.puts(" Functions: #{stats_before.node_counts_by_type[:function]} → #{stats_after.node_counts_by_type[:function]}")
IO.puts(" Coupling: #{stats_before.average_degree} → #{stats_after.average_degree}")
IO.puts(" Density: #{stats_before.density} → #{stats_after.density}")
if stats_after.average_degree < stats_before.average_degree do
IO.puts("✓ Coupling reduced - good refactoring!")
else
IO.puts("⚠ Coupling increased - review changes")
end
```
## Performance Characteristics
### Computational Complexity
| Algorithm | Time Complexity | Space Complexity | Typical Runtime (1K nodes) |
|-----------|----------------|------------------|----------------------------|
| PageRank | O(I × E) | O(V) | 50-100ms |
| Path Finding (limited) | O(max_paths × D) | O(max_paths × D) | 10-200ms |
| Path Finding (unlimited) | O(V^D) | O(V^D) | Risk of hang! |
| Degree Centrality | O(V + E) | O(V) | <10ms |
| Graph Stats | O(V + E + I × E) | O(V) | 100-150ms |
**Notation:**
- V = vertices (nodes)
- E = edges
- I = PageRank iterations
- D = max_depth
### Memory Usage
| Operation | Memory Footprint | Notes |
|-----------|------------------|-------|
| PageRank | ~100 bytes/node | Score storage |
| Path Finding | ~1KB/path | Path list storage |
| Centrality | ~200 bytes/node | Three metrics per node |
| Graph Stats | ~500 bytes | Aggregated results |
### Optimization Tips
**For Large Graphs (>10K nodes):**
1. **Use limited path finding:**
```elixir
paths = Algorithms.find_paths(from, to, max_paths: 50, max_depth: 8)
```
2. **Cache PageRank results:**
```elixir
scores = Algorithms.pagerank()
# Store in ETS or process state for reuse
```
3. **Filter before computing:**
```elixir
# Only compute for functions (not all nodes)
functions = Store.list_nodes(:function)
# Then run algorithms on subset
```
4. **Use lower precision:**
```elixir
Algorithms.pagerank(tolerance: 0.001) # Faster convergence
```
**For Dense Graphs (high degree):**
1. **Always use max_paths limit:**
```elixir
paths = Algorithms.find_paths(from, to, max_paths: 100) # Never unlimited!
```
2. **Reduce depth:**
```elixir
paths = Algorithms.find_paths(from, to, max_depth: 5)
```
3. **Disable warnings in loops:**
```elixir
for target <- targets do
Algorithms.find_paths(from, target, warn_dense: false)
end
```
## Best Practices
### 1. Always Use Limits
❌ **Don't:**
```elixir
# Dangerous on dense graphs!
paths = Algorithms.find_paths(from, to, max_paths: 10_000)
```
✅ **Do:**
```elixir
# Safe default limits
paths = Algorithms.find_paths(from, to) # Uses max_paths: 100
```
### 2. Check Graph Health First
```elixir
stats = Algorithms.graph_stats()
if stats.average_degree > 15 do
IO.puts("Warning: Dense graph detected. Using conservative limits.")
opts = [max_paths: 50, max_depth: 5]
else
opts = [] # Use defaults
end
paths = Algorithms.find_paths(from, to, opts)
```
### 3. Interpret Results in Context
```elixir
# Don't just look at numbers - understand the domain
scores = Algorithms.pagerank()
for {{:function, module, name, _arity}, score} <- scores do
if score > 0.3 and module != :Utils do
# High score but not a utility - may indicate tight coupling
IO.puts("Review: #{module}.#{name} has high PageRank but isn't a utility")
end
end
```
### 4. Combine Multiple Metrics
```elixir
# Use multiple algorithms for better insights
scores = Algorithms.pagerank()
centrality = Algorithms.degree_centrality()
for {{:function, _m, _n, _a} = id, score} <- scores do
metrics = Map.get(centrality, id)
# High PageRank + Low in-degree = Entry point
if score > 0.2 and metrics.in_degree < 2 do
IO.puts("Entry point: #{inspect(id)}")
end
# High PageRank + High in-degree = Critical utility
if score > 0.3 and metrics.in_degree > 10 do
IO.puts("Critical utility: #{inspect(id)}")
end
end
```
## References
- [PageRank Algorithm](https://en.wikipedia.org/wiki/PageRank)
- [Graph Centrality Metrics](https://en.wikipedia.org/wiki/Centrality)
- [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search)
- [Code Coupling Metrics](https://en.wikipedia.org/wiki/Coupling_(computer_programming))
---
**Related Documentation:**
- [README.md](README.md) - Project overview
- [PERSISTENCE.md](PERSISTENCE.md) - Caching and persistence
- [CONFIGURATION.md](CONFIGURATION.md) - Model and cache configuration