# Reach
Program dependence graph for Elixir and Erlang.
Reach builds a graph that captures **what depends on what** in your code.
Given any expression, you can ask: what affects it? What does it affect?
Can these two statements be safely reordered? Does user input reach this
database call without sanitization?
## Use cases
### Security: taint analysis
Does user input reach a dangerous sink without sanitization?
```elixir
graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex")
Reach.taint_analysis(graph,
sources: [type: :call, function: :params],
sinks: [type: :call, module: System, function: :cmd, arity: 2],
sanitizers: [type: :call, function: :sanitize]
)
#=> [%{source: ..., sink: ..., path: [...], sanitized: false}]
```
### Code quality: dead code detection
```elixir
graph = Reach.file_to_graph!("lib/accounts.ex")
for node <- Reach.dead_code(graph) do
IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}")
end
```
### Refactoring: safe reordering
```elixir
graph = Reach.file_to_graph!("lib/pipeline.ex")
for block <- Reach.nodes(graph, type: :block),
{a, b} <- pairs(block.children) do
if Reach.independent?(graph, a.id, b.id) do
IO.puts("Statements at lines #{a.source_span.start_line} and " <>
"#{b.source_span.start_line} can be safely reordered")
end
end
```
### Clone detection: reordering-equivalent code
Two blocks with the same statements in different order are clones if
the statements are independent. `canonical_order` sorts independent
siblings by structural hash so both orderings produce the same sequence:
```elixir
# For ExDNA integration
order = Reach.canonical_order(graph, block_node_id)
hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end))
```
### OTP: GenServer state flow analysis
```elixir
graph = Reach.file_to_graph!("lib/my_server.ex")
edges = Reach.edges(graph)
# Which callbacks read state?
state_reads = Enum.filter(edges, &(&1.label == :state_read))
# Does state flow between callbacks?
state_passes = Enum.filter(edges, &(&1.label == :state_pass))
# ETS write-then-read dependencies
ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label))
```
### Concurrency: crash propagation
```elixir
graph = Reach.file_to_graph!("lib/my_supervisor.ex")
edges = Reach.edges(graph)
# Which monitors connect to which :DOWN handlers?
Enum.filter(edges, &(&1.label == :monitor_down))
# Does this process trap exits? Which handler receives them?
Enum.filter(edges, &(&1.label == :trap_exit))
# Task.async → Task.await data flow
Enum.filter(edges, &(&1.label == :task_result))
# Supervisor child startup ordering
Enum.filter(edges, &(&1.label == :startup_order))
```
## Installation
```elixir
def deps do
[{:reach, "~> 1.0"}]
end
```
## Building a graph
```elixir
# From Elixir source
{:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1")
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")
# From Erlang source
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)
{:ok, graph} = Reach.file_to_graph("src/my_module.erl") # auto-detected
# From pre-parsed AST (for Credo/ExDNA integration)
{:ok, ast} = Code.string_to_quoted(source)
{:ok, graph} = Reach.ast_to_graph(ast)
# From compiled BEAM bytecode (sees macro-expanded code)
{:ok, graph} = Reach.module_to_graph(MyApp.Accounts)
{:ok, graph} = Reach.compiled_to_graph(source)
# Bang variants
graph = Reach.file_to_graph!("lib/my_module.ex")
```
The BEAM frontend captures code invisible to source-level analysis —
`use GenServer` callbacks, macro-expanded `try/rescue`, generated functions:
```elixir
# Source: only sees init/1 and handle_call/3
Reach.string_to_graph(genserver_source)
# BEAM: also sees child_spec/1, terminate/2, handle_info/2
Reach.compiled_to_graph(genserver_source)
```
## Multi-file project analysis
```elixir
# Analyze a whole Mix project
project = Reach.Project.from_mix_project()
# Or specific paths
project = Reach.Project.from_glob("lib/**/*.ex")
# Cross-module taint analysis
Reach.Project.taint_analysis(project,
sources: [type: :call, function: :params],
sinks: [type: :call, module: System, function: :cmd]
)
# Dependency summaries for external modules
summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL)
project = Reach.Project.from_mix_project(summaries: summaries)
```
## API reference
### Nodes
```elixir
Reach.nodes(graph) # all nodes
Reach.nodes(graph, type: :call) # by type
Reach.nodes(graph, type: :call, module: Enum) # by module
Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1)
node = Reach.node(graph, node_id)
node.type #=> :call
node.meta #=> %{module: Repo, function: :insert, arity: 1}
node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...}
```
### Slicing
```elixir
Reach.backward_slice(graph, node_id) # what affects this?
Reach.forward_slice(graph, node_id) # what does this affect?
Reach.chop(graph, source_id, sink_id) # how does A influence B?
```
### Data flow and independence
```elixir
Reach.data_flows?(graph, source_id, sink_id)
Reach.depends?(graph, id_a, id_b)
Reach.independent?(graph, id_a, id_b)
Reach.has_dependents?(graph, node_id)
Reach.passes_through?(graph, source_id, sink_id, &predicate/1)
```
### Effects
```elixir
Reach.pure?(node) #=> true
Reach.classify_effect(node) #=> :pure | :io | :read | :write | :send | :receive | :exception | :unknown
```
Covers `Enum`, `Map`, `List`, `String`, `Keyword`, `Tuple`, `Integer`,
`Float`, `Atom`, `MapSet`, `Range`, `Regex`, `URI`, `Path`, `Base`, and
Erlang equivalents. `Enum.each` correctly classified as impure.
### Dependencies
```elixir
Reach.control_deps(graph, node_id) #=> [{controller_id, label}, ...]
Reach.data_deps(graph, node_id) #=> [{source_id, :variable_name}, ...]
Reach.edges(graph) # all dependence edges
```
### Interprocedural
```elixir
Reach.function_graph(graph, {MyModule, :my_function, 2})
Reach.context_sensitive_slice(graph, node_id) # Horwitz-Reps-Binkley
Reach.call_graph(graph) # {module, function, arity} vertices
```
### Taint analysis
```elixir
# Keyword filters (same format as nodes/2)
Reach.taint_analysis(graph,
sources: [type: :call, function: :params],
sinks: [type: :call, module: Repo, function: :query],
sanitizers: [type: :call, function: :sanitize]
)
# Predicates also work
Reach.taint_analysis(graph,
sources: &(&1.meta[:function] in [:params, :body_params]),
sinks: [type: :call, module: System, function: :cmd]
)
#=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}]
```
### Dead code
```elixir
Reach.dead_code(graph)
#=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...]
```
### Canonical ordering
```elixir
Reach.canonical_order(graph, block_id)
#=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent
# siblings have deterministic order regardless of source order
```
### Neighbors
```elixir
Reach.neighbors(graph, node_id) # all direct neighbors
Reach.neighbors(graph, node_id, :state_read) # only state_read edges
Reach.neighbors(graph, node_id, :data) # matches {:data, _} labels
```
### Raw graph access
```elixir
raw = Reach.to_graph(graph) # returns the underlying libgraph Graph.t()
Graph.vertices(raw) # use any libgraph function
Graph.get_shortest_path(raw, id_a, id_b)
```
### Export
```elixir
{:ok, dot} = Reach.to_dot(graph)
File.write!("graph.dot", dot)
# dot -Tpng graph.dot -o graph.png
```
## Edge types
| Label | Source | Meaning |
|-------|--------|---------|
| `{:data, var}` | DDG | Data flows through variable `var` |
| `:containment` | DDG | Parent expression depends on child sub-expression |
| `{:control, label}` | CDG | Execution controlled by branch condition |
| `:call` | SDG | Call site to callee function |
| `:parameter_in` | SDG | Argument flows to formal parameter |
| `:parameter_out` | SDG | Return value flows back to caller |
| `:summary` | SDG | Shortcut: parameter flows to return value |
| `:state_read` | OTP | GenServer callback reads state parameter |
| `:state_pass` | OTP | State flows between consecutive callbacks |
| `{:ets_dep, table}` | OTP | ETS write → read on same table |
| `{:pdict_dep, key}` | OTP | Process dictionary put → get on same key |
| `:message_order` | OTP | Sequential sends to same pid |
| `:monitor_down` | Concurrency | `Process.monitor` → `handle_info({:DOWN, ...})` |
| `:trap_exit` | Concurrency | `Process.flag(:trap_exit)` → `handle_info({:EXIT, ...})` |
| `:link_exit` | Concurrency | `spawn_link` / `Process.link` → `:EXIT` handler |
| `:task_result` | Concurrency | `Task.async` → `Task.await` data flow |
| `:startup_order` | Concurrency | Supervisor child A starts before child B |
| `{:message_content, tag}` | OTP | `send(pid, {tag, data})` payload → handler pattern vars |
| `:call_reply` | OTP | `{:reply, value, state}` → `GenServer.call` return |
| `:match_binding` | DDG | Match RHS flows to LHS definition var |
| `:higher_order` | SDG | Argument flows through higher-order function to result |
## Architecture
```mermaid
graph TD
Source["Source (.ex / .erl / .beam)"] --> Frontend
Frontend["Frontend → IR Nodes<br/><i>Elixir AST · Erlang abstract forms · BEAM bytecode</i>"] --> CFG
CFG["Control Flow Graph<br/><i>entry/exit · branching · exceptions</i>"] --> Dom
Dom["Dominators<br/><i>post-dominator tree · dominance frontier</i>"] --> CD
CD["Control Dependence<br/><i>Ferrante et al. 1987</i>"] --> PDG
CFG --> DD
DD["Data Dependence<br/><i>def-use chains · containment edges</i>"] --> PDG
PDG["Program Dependence Graph<br/><i>control ∪ data</i>"] --> SDG
SDG["System Dependence Graph + OTP + Concurrency<br/><i>call/summary edges · GenServer · ETS · monitors · tasks</i>"] --> Query
Query["Query / Effects<br/><i>slicing · independence · taint · purity</i>"]
```
## Performance
Benchmarked on real projects (single-threaded, Apple M1 Pro):
| Project | Files | Functions | IR Nodes | Time | ms/file |
|---------|-------|-----------|----------|------|---------|
| ex_slop | 26 | 109 | 5,186 | 36ms | 1.4 |
| ex_dna | 32 | 208 | 10,649 | 87ms | 2.7 |
| Livebook | 72 | 589 | 22,156 | 160ms | 2.2 |
| Oban | 64 | 606 | 31,413 | 195ms | 3.0 |
| Keila | 190 | 1,074 | 49,354 | 282ms | 1.5 |
| Phoenix | 74 | 1,090 | 48,292 | 333ms | 4.5 |
| Absinthe | 282 | 1,411 | 68,416 | 375ms | 1.3 |
740 files, zero crashes.
## References
- Ferrante, Ottenstein, Warren — *The Program Dependence Graph and Its Use in
Optimization* (1987)
- Horwitz, Reps, Binkley — *Interprocedural Slicing Using Dependence Graphs*
(1990)
- Silva, Tamarit, Tomás — *System Dependence Graphs for Erlang Programs* (2012)
- Cooper, Harvey, Kennedy — *A Simple, Fast Dominance Algorithm* (2001)
## License
[MIT](LICENSE)