# Grove
**Conflict-free Replicated AST (Abstract Syntax Tree) data types for Elixir.**
Grove is a specialized CRDT library for building collaborative code editors, structured document editors, and any application requiring real-time synchronization of tree-structured data across distributed nodes.
## Features
- **Operation-based Tree CRDTs** — Efficient synchronization with minimal bandwidth
- **Eg-walker Optimization** — 10x faster for sequential editing (the common case)
- **Move Operations** — Refactor and reorganize AST nodes without creating cycles
- **Undo/Redo Support** — First-class support for editor integration
- **Phoenix LiveView Integration** — Real-time collaboration in web applications
- **Distributed Erlang Support** — Native clustering with BEAM distribution
- **Tombstone Garbage Collection** — Automatic cleanup of deleted nodes
- **Plain Data Conversion** — Store JSON in your database, use CRDTs only for sync
## Architecture Overview
```mermaid
flowchart TB
subgraph Database
JSON[(Plain JSON/AST)]
end
subgraph "Grove Session"
SM[Session Manager]
Tree[CRDT Tree]
end
subgraph "LiveView Replicas"
LV1[LiveView A]
LV2[LiveView B]
LV3[LiveView C]
end
JSON -->|from_data| Tree
Tree -->|to_data| JSON
LV1 <-->|ops| SM
LV2 <-->|ops| SM
LV3 <-->|ops| SM
SM --- Tree
```
## Installation
Add `grove` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:grove, "~> 0.1.0"}
]
end
```
## Quick Start
```elixir
# Create a new AST CRDT with a replica ID
tree = Grove.new(replica_id: "node_1")
# Insert nodes
tree = tree
|> Grove.insert_node(:root, :function, %{name: "hello", arity: 1})
|> Grove.insert_node("node_1_1", :param, %{name: "name"}, parent: :root)
|> Grove.insert_node("node_1_2", :body, %{}, parent: :root)
# Move a node (safe, prevents cycles)
tree = Grove.move_node(tree, "node_1_1", new_parent: "node_1_2")
# Generate operations for sync
{tree, ops} = Grove.flush_operations(tree)
# On another replica, apply received operations
other_tree = Grove.apply_operations(other_tree, ops)
# Both trees converge to the same state
Grove.to_ast(tree) == Grove.to_ast(other_tree) # => true
```
## Core Concepts
### Plain Data Conversion
Grove separates CRDT state from application data. Store plain JSON/maps in your database and only use Grove for real-time sync:
```elixir
# Load plain data from database, convert to CRDT tree
plain_ast = MyRepo.get!(Document, id).data
tree = Grove.from_data(plain_ast, replica_id: socket.id)
# After editing, convert back to plain data for storage
plain_ast = Grove.to_data(tree)
MyRepo.update!(document, %{data: plain_ast})
# Round-trip guarantee
assert Grove.to_data(Grove.from_data(data, replica_id: "a")) == data
```
### Replica IDs
Every node in a Grove cluster needs a unique replica ID. This is used to generate globally unique node identifiers and resolve concurrent operations deterministically.
```elixir
tree = Grove.new(replica_id: System.get_env("NODE_ID") || UUID.uuid4())
```
### Node Structure
Each node in the tree has:
| Field | Description |
|-------|-------------|
| `id` | Unique identifier (replica_id + lamport_timestamp) |
| `type` | AST node type (atom) |
| `value` | Node content (map) — uses LWW-Register semantics |
| `parent_id` | Parent node reference |
| `children` | Ordered list of child node IDs |
| `deleted?` | Tombstone flag for deleted nodes |
### Operations
Grove uses operation-based CRDTs where each modification generates an operation that can be sent to other replicas:
```elixir
# Operations are automatically generated
{tree, ops} = Grove.flush_operations(tree)
# ops might contain:
[
%Grove.Op.Insert{id: "node_1_3", type: :expr, value: %{}, parent: "node_1_2", position: 0},
%Grove.Op.Move{id: "node_1_1", new_parent: "node_1_2", position: 1},
%Grove.Op.Update{id: "node_1_2", value: %{line: 5}}
]
```
### Batch Operations
Group multiple operations into a single atomic unit for undo and broadcast:
```elixir
# Adding a composite field (address) as one undoable action
tree = Grove.batch(tree, fn t ->
t
|> Grove.insert_node("addr", :group, %{label: "Address"}, parent: "form")
|> Grove.insert_node("street", :text, %{label: "Street"}, parent: "addr")
|> Grove.insert_node("city", :text, %{label: "City"}, parent: "addr")
|> Grove.insert_node("zip", :text, %{label: "ZIP"}, parent: "addr")
end)
# Single undo reverts all 4 inserts
{tree, ops} = Grove.undo(tree)
```
### Operation Metadata
Attach context to operations for audit trails and version history:
```elixir
tree = Grove.update_node(tree, node_id, %{label: "New Label"},
meta: %{
user_id: current_user.id,
user_name: current_user.name,
timestamp: DateTime.utc_now()
}
)
# Get operation history with metadata
Grove.history(tree)
|> Enum.map(fn op -> "#{op.meta.user_name} edited #{op.node_id}" end)
```
## Query API
Find and traverse nodes in the tree:
```elixir
# Find by ID
Grove.get_node(tree, "field_123")
# Find by type
Grove.find_nodes(tree, type: :dropdown)
# Find by attribute
Grove.find_nodes(tree, where: [required: true])
# Get path to node (for breadcrumbs)
Grove.path_to(tree, "field_123")
# => ["form_root", "step_1", "field_123"]
# Traversal
Grove.parent(tree, "field_123")
Grove.children(tree, "field_123")
Grove.siblings(tree, "field_123")
Grove.ancestors(tree, "field_123")
Grove.descendants(tree, "field_123")
```
## Move Operation Semantics
The move operation is the most complex part of tree CRDTs. Grove implements the algorithm from ["A highly-available move operation for replicated trees"](https://martin.kleppmann.com/papers/move-op.pdf) (Kleppmann et al., 2021) to handle concurrent moves safely.
### Cycle Prevention
When two replicas concurrently move nodes in conflicting ways, Grove prevents cycles:
```mermaid
graph LR
subgraph "Before"
A1[A] --> B1[B] --> C1[C]
end
subgraph "Concurrent Moves"
R1[Replica 1: move B under C]
R2[Replica 2: move C under B]
end
subgraph "After Sync"
A2[A] --> B2[B] --> C2[C]
end
style R1 fill:#ffcccc
style R2 fill:#ccccff
```
```elixir
# Replica A: move X under Y
tree_a = Grove.move_node(tree_a, "X", new_parent: "Y")
# Replica B: move Y under X (concurrent)
tree_b = Grove.move_node(tree_b, "Y", new_parent: "X")
# After sync, one move "wins" deterministically based on Lamport timestamps
# No cycles are ever created
```
### Move vs Delete Conflicts
When one replica moves a node while another deletes its destination:
```elixir
# Replica A: move X under Y
# Replica B: delete Y (concurrent)
# Result: X is moved to Y's former parent (or root if Y was a root child)
```
## Undo/Redo
Grove provides undo/redo support based on ["A Generic Undo Support for State-Based CRDTs"](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) (Yu et al., 2019).
```elixir
# Enable undo tracking
tree = Grove.new(replica_id: "node_1", undo: true)
# Make some changes
tree = tree
|> Grove.insert_node("n1", :expr, %{value: 1}, parent: :root)
|> Grove.insert_node("n2", :expr, %{value: 2}, parent: :root)
|> Grove.delete_node("n1")
# Undo the last operation
{tree, ops} = Grove.undo(tree)
# "n1" is restored, and ops are generated to sync the undo
# Redo
{tree, ops} = Grove.redo(tree)
```
### Selective Undo
Undo a specific operation rather than just the most recent:
```elixir
# Get operation history
history = Grove.history(tree)
# Undo a specific operation by ID
{tree, ops} = Grove.undo(tree, operation_id: "op_12345")
```
## Phoenix LiveView Integration
Grove provides first-class LiveView integration with automatic subscription and operation handling.
```mermaid
sequenceDiagram
participant User A
participant LiveView A
participant Session Manager
participant LiveView B
participant User B
User A->>LiveView A: Edit node
LiveView A->>Session Manager: apply_op(doc_id, op)
Session Manager->>LiveView B: {:grove_ops, ops}
LiveView B->>User B: Re-render
Session Manager->>LiveView A: {:grove_ops, ops}
LiveView A->>User A: Re-render
```
### Using Grove.LiveView Hooks
```elixir
defmodule MyAppWeb.EditorLive do
use MyAppWeb, :live_view
on_mount {Grove.LiveView, :subscribe}
def mount(%{"id" => id}, _session, socket) do
# Load plain data and convert to CRDT tree
data = MyRepo.get!(Document, id).data
tree = Grove.from_data(data, replica_id: socket.id)
{:ok, assign(socket, doc_id: id, tree: tree)}
end
def handle_event("insert_node", params, socket) do
tree = Grove.insert_node(
socket.assigns.tree,
params["id"],
String.to_atom(params["type"]),
params["value"],
parent: params["parent_id"]
)
# Flush and broadcast
{tree, ops} = Grove.flush_operations(tree)
Grove.Session.apply_op(socket.assigns.doc_id, ops)
{:noreply, assign(socket, tree: tree)}
end
def handle_info({:grove_ops, ops}, socket) do
tree = Grove.apply_operations(socket.assigns.tree, ops)
{:noreply, assign(socket, tree: tree)}
end
end
```
### Session Management
Grove manages document sessions with automatic lifecycle handling:
```elixir
# Sessions are started automatically when first replica joins
Grove.Session.join(doc_id, self())
# Get current tree state
tree = Grove.Session.get_state(doc_id)
# Get session info
Grove.Session.info(doc_id)
# => %{
# replicas: 3,
# last_activity: ~U[2024-01-15 10:30:00Z],
# pending_ops: 0
# }
# Leave session (automatic on process exit)
Grove.Session.leave(doc_id, self())
```
Session lifecycle:
- **Grace period**: Sessions stay alive for 5 minutes after last replica leaves
- **Periodic persistence**: Auto-saves every 30 seconds when dirty
- **Operation buffering**: Debounces rapid updates
### Presence Tracking
Built-in cursor and selection tracking for collaborative UIs:
```elixir
# Track what each user is focused on
tree = Grove.set_focus(tree, "node_123")
tree = Grove.set_selection(tree, ["node_123", "node_124"])
# Get all replica presence (with auto-assigned colors)
Grove.get_presence(tree)
# => %{
# "replica_a" => %{focus: "node_1", selection: [], user: "Alice", color: "#4CAF50"},
# "replica_b" => %{focus: "node_3", selection: ["node_3", "node_4"], user: "Bob", color: "#2196F3"}
# }
```
```mermaid
graph TD
subgraph "Collaborative View"
A[Form Root]
B[Step 1 - Alice editing]
C[Step 2 - Bob editing]
D[Field 1]
E[Field 2 - Selected by Bob]
end
A --> B
A --> C
B --> D
C --> E
style B fill:#4CAF50,color:white
style C fill:#2196F3,color:white
style E stroke:#2196F3,stroke-width:3px
```
### LiveView Hooks for Editor Integration
```javascript
// assets/js/hooks/grove_editor.js
export const GroveEditor = {
mounted() {
this.editor = createEditor(this.el);
this.editor.on("change", (delta) => {
this.pushEvent("editor_change", delta);
});
this.handleEvent("apply_ops", (ops) => {
this.editor.applyOps(ops);
});
}
}
```
## Distributed Erlang Integration
Grove works natively with Erlang distribution for clustering:
```elixir
defmodule MyApp.DocumentCluster do
use GenServer
def start_link(document_id) do
GenServer.start_link(__MODULE__, document_id, name: via(document_id))
end
def init(document_id) do
# Join the cluster for this document
:pg.join(:grove_documents, document_id, self())
tree = Grove.new(replica_id: node_id())
{:ok, %{document_id: document_id, tree: tree}}
end
def handle_cast({:ops, ops, from_node}, state) do
tree = Grove.apply_operations(state.tree, ops)
{:noreply, %{state | tree: tree}}
end
def handle_call({:mutate, fun}, _from, state) do
tree = fun.(state.tree)
{tree, ops} = Grove.flush_operations(tree)
# Broadcast to all replicas
broadcast_ops(state.document_id, ops)
{:reply, :ok, %{state | tree: tree}}
end
defp broadcast_ops(document_id, ops) do
for pid <- :pg.get_members(:grove_documents, document_id), pid != self() do
GenServer.cast(pid, {:ops, ops, Node.self()})
end
end
end
```
## AST Conversion
Convert between Grove trees and Elixir AST:
```elixir
# Parse Elixir code into a Grove tree
{:ok, ast} = Code.string_to_quoted("def hello(name), do: name")
tree = Grove.from_elixir_ast(ast, replica_id: "node_1")
# Convert back to Elixir AST
elixir_ast = Grove.to_elixir_ast(tree)
# Pretty print
Grove.to_string(tree)
# => "def hello(name), do: name"
```
### Custom AST Schemas
Define custom node types with field-level CRDT semantics:
```elixir
defmodule MyApp.FormSchema do
use Grove.Schema
node :form do
field :title, :lww_register # Last-writer-wins (default)
field :description, :lww_register
children :steps
end
node :dropdown do
field :label, :lww_register
field :options, :rga_list # Ordered list - concurrent inserts merge
field :tags, :or_set # Set - adds/removes merge
end
node :text_field do
field :label, :lww_register
field :placeholder, :lww_register
field :validators, :or_set # Set of validation rules
end
end
# Use with Grove
tree = Grove.new(replica_id: "node_1", schema: MyApp.FormSchema)
```
**Field CRDT Types:**
| Type | Merge Behavior | Use Case |
|------|----------------|----------|
| `:lww_register` | Last writer wins | Simple values (default) |
| `:mv_register` | Keep all concurrent values | Expose conflicts to user |
| `:or_set` | Union of adds, intersection of removes | Tags, categories |
| `:rga_list` | Interleave concurrent inserts | Ordered options |
## Performance
Grove is optimized for real-time collaborative editing:
| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(1) amortized | O(1) |
| Delete | O(1) | O(1) (tombstone) |
| Move | O(log n) | O(1) |
| Apply Op | O(1) amortized | O(1) |
| Find Node | O(1) | — |
| Traverse | O(n) | O(depth) |
### Benchmarks
```
Date: 2026-01-03
Elixir: 1.18.4, OTP: 25
Scenario ips Average Memory
Sequential (fast-path) 617 1.62 ms 1.43 MB
Collaborative (mixed) 144 6.94 ms 5.82 MB
High-concurrency (slow-path) 63 15.76 ms 21.84 MB
Local Operations ips Average
get_node 5.57 M 179 ns
put_node 1.13 M 888 ns
update_node 0.65 M 1537 ns
```
**Key insight**: Eg-walker's fast-path is 10x faster than slow-path for the common case (sequential editing).
See [benchmarks guide](guides/benchmarks.md) for detailed analysis.
### Eg-walker Optimization
Grove implements the [Eg-walker algorithm](https://arxiv.org/abs/2409.14252) (EuroSys 2025 Best Paper) which optimizes for the common case: sequential editing.
- **Fast-path**: When edits are sequential (one user or turn-taking), operations apply directly with near-zero CRDT overhead
- **Slow-path**: When true concurrency occurs, full CRDT merge ensures correctness
Most collaborative editing is sequential (~85%), so this optimization provides significant real-world performance benefits.
## Garbage Collection
Tombstones (markers for deleted nodes) are garbage collected when all replicas have observed the deletion:
```elixir
# Configure GC
tree = Grove.new(
replica_id: "node_1",
gc: [
# Minimum age before tombstone can be collected
min_tombstone_age: :timer.hours(24),
# Run GC every N operations
gc_interval: 1000
]
)
# Manual GC with known replica states
tree = Grove.gc(tree, observed_by: ["node_1", "node_2", "node_3"])
```
## Persistence
Grove trees can be serialized for storage:
```elixir
# Serialize to binary (efficient)
binary = Grove.serialize(tree)
tree = Grove.deserialize(binary)
# Serialize to JSON (interoperable)
json = Grove.to_json(tree)
tree = Grove.from_json(json)
# Ecto integration
defmodule MyApp.Document do
use Ecto.Schema
schema "documents" do
field :tree, Grove.Ecto.Type
timestamps()
end
end
```
## Testing
Grove provides comprehensive testing utilities for verifying CRDT correctness:
```elixir
defmodule MyApp.EditorTest do
use ExUnit.Case
import Grove.Testing
test "concurrent edits converge" do
# Create isolated replicas from initial data
[tree_a, tree_b] = create_replicas(initial_data, count: 2)
# Simulate concurrent edits
tree_a = Grove.insert_node(tree_a, "n1", :expr, %{v: 1}, parent: :root)
tree_b = Grove.insert_node(tree_b, "n2", :expr, %{v: 2}, parent: :root)
# Sync all replicas
[tree_a, tree_b] = sync_all([tree_a, tree_b])
# Verify convergence
assert_convergent([tree_a, tree_b])
assert_no_cycles(tree_a)
end
test "network partition and heal" do
[a, b, c, d] = create_replicas(data, count: 4)
# Simulate partition: {a, b} and {c, d} can't communicate
{group1, group2} = partition([a, b, c, d], groups: [[0, 1], [2, 3]])
# Make edits in each partition
group1 = update_all(group1, fn t -> Grove.insert_node(t, "x", :text, %{}, parent: :root) end)
group2 = update_all(group2, fn t -> Grove.insert_node(t, "y", :text, %{}, parent: :root) end)
# Heal partition
all_trees = heal(group1, group2)
# All should converge
assert_convergent(all_trees)
end
test "fuzz test with random operations" do
[tree_a, tree_b, tree_c] = create_replicas(data, count: 3)
# Generate and apply random operations
trees = [tree_a, tree_b, tree_c]
|> Enum.map(fn t -> apply_random_operations(t, count: 100) end)
|> sync_all()
assert_convergent(trees)
Enum.each(trees, &assert_no_cycles/1)
end
end
```
### Testing Utilities Reference
| Function | Description |
|----------|-------------|
| `create_replicas/2` | Create N isolated replicas from initial data |
| `sync/2` | Sync two replicas bidirectionally |
| `sync_all/1` | Sync all replicas in a list |
| `assert_convergent/1` | Assert all trees have same value |
| `assert_no_cycles/1` | Assert tree has no cycles |
| `partition/2` | Simulate network partition |
| `heal/2` | Heal partition, sync all |
| `random_operations/2` | Generate random ops for fuzzing |
## Research Background
Grove is based on peer-reviewed research in distributed systems and CRDTs:
### Core Papers
- [A highly-available move operation for replicated trees](https://martin.kleppmann.com/papers/move-op.pdf) — Kleppmann et al., 2021
- [Eg-walker: Better, Faster, Smaller](https://arxiv.org/abs/2409.14252) — Gentle & Kleppmann, EuroSys 2025 (Best Paper)
- [COAST: A Conflict-free Replicated Abstract Syntax Tree](http://soft.vub.ac.be/Publications/2022/vub-tr-soft-22-17.pdf) — Munsters et al., 2022
- [A coordination-free, convergent, and safe replicated tree](https://arxiv.org/abs/2103.04828) — Nair et al., 2021
### Foundational CRDT Research
- [A comprehensive study of Convergent and Commutative Replicated Data Types](http://hal.inria.fr/inria-00555588/) — Shapiro et al., 2011
- [Pure Operation-Based Replicated Data Types](https://arxiv.org/abs/1710.04469) — Baquero et al., 2017
- [Interleaving Anomalies in Collaborative Text Editors](https://martin.kleppmann.com/papers/interleaving-papoc19.pdf) — Kleppmann et al., 2019
### Undo/Redo
- [A Generic Undo Support for State-Based CRDTs](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) — Yu et al., 2019
- [Undo and Redo Support for Replicated Registers](https://arxiv.org/abs/2404.11308) — Stewen & Kleppmann, 2024
See the [architecture guide](guides/architecture.md) for implementation details.
## Comparison with Other Libraries
| Feature | Grove | Automerge | Yjs | Riak DT |
|---------|-------|-----------|-----|---------|
| Language | Elixir | Rust/JS | JS | Erlang |
| Tree CRDTs | ✅ First-class | ✅ JSON | ✅ XML | ❌ |
| Move operation | ✅ | ✅ | ✅ | ❌ |
| Eg-walker optimization | ✅ | ❌ | ❌ | ❌ |
| Undo/Redo | ✅ | ✅ | ✅ | ❌ |
| BEAM native | ✅ | ❌ | ❌ | ✅ |
| Phoenix integration | ✅ | ❌ | ❌ | ❌ |
## Contributing
Contributions are welcome!
```bash
# Clone the repository
git clone https://gitlab.com/tristanperalta/grove.git
cd grove
# Install dependencies
mix deps.get
# Run tests
mix test
# Run benchmarks
mix bench
```
## License
Grove is released under the MIT License. See [LICENSE](LICENSE) for details.
---
Built with ❤️ for the Elixir community.