# Grove
**Conflict-free Replicated Tree CRDTs for Elixir.**
Grove is a CRDT library for building collaborative editors, structured-document editors, and any application that needs real-time synchronization of tree-structured data (including ASTs) across distributed nodes.
## Features
- **Operation-based Tree CRDT** — efficient synchronization with minimal bandwidth
- **Eg-walker optimization** — near-zero CRDT overhead for sequential editing (the common case)
- **Move operations** — reorganize nodes without ever creating cycles
- **Undo/redo** — built-in local undo/redo stacks for editor integration
- **Schema DSL** — per-field CRDT types (LWW, MV, OR-Set, RGA list, counter) with migrations
- **Primitive CRDTs** — counters, registers, sets, maps, and an RGA sequence, all with delta-state support
- **Phoenix LiveView integration** — real-time collaboration in web applications
- **Distributed Erlang support** — native `:pg`-based clustering on the BEAM
- **Plain-data conversion** — keep plain maps/JSON in your database, use CRDTs only for sync
## Architecture Overview
```mermaid
flowchart TB
subgraph Database
JSON[(Plain JSON/AST)]
end
subgraph "Grove Session"
SM[Session Manager]
Tree[CRDT Tree]
end
subgraph "LiveView Replicas"
LV1[LiveView A]
LV2[LiveView B]
LV3[LiveView C]
end
JSON -->|from_data| Tree
Tree -->|to_data| JSON
LV1 <-->|ops| SM
LV2 <-->|ops| SM
LV3 <-->|ops| SM
SM --- Tree
```
## Installation
Add `grove` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:grove, "~> 0.2.0"}
]
end
```
Grove requires Elixir `~> 1.20` and is tested against Erlang/OTP 29.
## Quick Start
```elixir
alias Grove.{Tree, Node}
# Create a tree for a replica (the replica id makes node ids globally unique)
tree = Tree.new("node_1")
# Build nodes; a node's parent is set with :parent_id
root = Node.new("root", :function, attrs: %{name: "hello", arity: 1})
param = Node.new("param_1", :param, parent_id: "root", attrs: %{name: "name"})
body = Node.new("body_1", :body, parent_id: "root")
# Insert them
tree =
tree
|> Tree.put_node(root)
|> Tree.put_node(param)
|> Tree.put_node(body)
# Move a node under a new parent (cycle-safe; returns {:error, reason} if invalid)
tree = Tree.move_node(tree, "param_1", "body_1")
# Query
Tree.get_node(tree, "param_1")
Tree.children(tree, "body_1") # => ["param_1"]
```
## Core Concepts
### Plain Data Conversion
Grove separates CRDT state from application data. Store plain maps/JSON in your database and use Grove only for real-time sync:
```elixir
# Load plain data and convert to a CRDT tree
plain = %{
id: "form_1",
type: :form,
attrs: %{title: "Survey"},
children: [
%{id: "field_1", type: :text, attrs: %{label: "Name"}}
]
}
tree = Grove.from_data(plain, replica_id: "node_a")
# After editing, convert back to plain data for storage
data = Grove.to_data(tree)
# Round-trip guarantee when all nodes have explicit ids
^plain = Grove.to_data(Grove.from_data(plain, replica_id: "a"))
```
### Replica IDs
Every node in a Grove cluster needs a unique replica id. It is used to generate globally unique node identifiers and to resolve concurrent operations deterministically.
```elixir
tree = Tree.new(System.get_env("NODE_ID") || "node_1")
```
### Node Structure
Each node in the tree has:
| Field | Description |
|-------|-------------|
| `id` | Unique identifier (string) |
| `type` | Node type (atom) |
| `attrs` | Node attributes (map) |
| `parent_id` | Parent node reference |
| `children` | Ordered list of child node ids |
| `deleted?` | Tombstone flag |
### Operations & Sync
Grove is operation-based: each mutation records an operation and tracks it in the tree's `pending_ops`. To synchronize, you broadcast those operations to other replicas and apply incoming events with `Grove.Tree.apply_remote/2`. The [Phoenix LiveView integration](#phoenix-liveview-integration) wires this up for you (broadcast on change, apply on receive), so most applications never touch the low-level event plumbing directly.
### Batch Operations
Group multiple operations into a single atomic unit for undo and broadcast:
```elixir
# Add a composite "address" group as one undoable action.
# Tree.batch/3 returns {updated_tree, delta}; the delta is what you broadcast.
{tree, _delta} =
Tree.batch(tree, fn t ->
addr = Node.new("addr", :group, parent_id: "form_1", attrs: %{label: "Address"})
street = Node.new("street", :text, parent_id: "addr", attrs: %{label: "Street"})
city = Node.new("city", :text, parent_id: "addr", attrs: %{label: "City"})
t
|> Tree.put_node(addr)
|> Tree.put_node(street)
|> Tree.put_node(city)
end)
# A single undo reverts the whole batch
{:ok, tree} = Tree.undo(tree)
```
### Operation Metadata
Attach context to operations for audit trails and version history:
```elixir
tree =
Tree.update_node(
tree,
"field_1",
fn node -> Node.update_attrs(node, %{label: "Full name"}) end,
meta: %{user_id: "u1", user_name: "Alice"}
)
# Read history and pull metadata back out
tree
|> Tree.history(where: [user_id: "u1"])
|> Enum.map(fn entry ->
meta = Tree.operation_meta(entry.operation)
"#{meta[:user_name]} edited #{Enum.join(entry.node_ids, ", ")}"
end)
```
## Query API
Find and traverse nodes in the tree:
```elixir
# Find by id
Tree.get_node(tree, "field_1")
# Find by type
Tree.find_nodes(tree, type: :text)
# Find by attribute
Tree.find_nodes(tree, where: [required: true])
# Combine filters
Tree.find_nodes(tree, type: :text, where: [required: true])
# Path to a node (for breadcrumbs)
Tree.path_to(tree, "field_1") # => ["form_1", "addr", "field_1"]
# Traversal
Tree.root(tree)
Tree.parent(tree, "field_1")
Tree.children(tree, "addr")
Tree.siblings(tree, "field_1")
Tree.ancestors(tree, "field_1")
```
## Move Operation Semantics
The move operation is the most complex part of tree CRDTs. Grove implements the algorithm from ["A highly-available move operation for replicated trees"](https://martin.kleppmann.com/papers/move-op.pdf) (Kleppmann et al., 2021) to handle concurrent moves safely.
### Cycle Prevention
When two replicas concurrently move nodes in conflicting ways, Grove guarantees no cycle is ever created:
```elixir
# Replica A: move X under Y
tree_a = Tree.move_node(tree_a, "X", "Y")
# Replica B: move Y under X (concurrent)
tree_b = Tree.move_node(tree_b, "Y", "X")
# After sync, one move "wins" deterministically (by Lamport/HLC timestamp);
# the losing move is discarded so no cycle forms.
```
## Undo / Redo
Grove maintains per-tree undo/redo stacks. Every mutation pushes its inverse onto the undo stack; a new operation clears the redo stack.
```elixir
{:ok, tree} = Tree.undo(tree) # or {:error, :empty_stack}
{:ok, tree} = Tree.redo(tree) # or {:error, :empty_stack}
Tree.can_undo?(tree) # => boolean
Tree.can_redo?(tree) # => boolean
```
Undo/redo is informed by ["A Generic Undo Support for State-Based CRDTs"](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) (Yu et al., 2019).
## Schema DSL
Define node types with per-field CRDT semantics. Each field can use a different CRDT type, and schemas support versioned migrations.
```elixir
defmodule MyApp.FormSchema do
use Grove.Schema
version 1
node :dropdown do
field :label, :lww_register, default: "Untitled"
field :options, :rga_list, default: [] # ordered list, concurrent inserts merge
field :tags, :or_set, default: MapSet.new() # set, adds/removes merge
end
node :text_field do
field :label, :lww_register, default: ""
field :placeholder, :lww_register, default: ""
end
end
# Create a node whose attrs are backed by per-field CRDTs
node = MyApp.FormSchema.new_node(:dropdown, "node_1", %{label: "Country"})
# Update a single field (CRDT-aware)
node = MyApp.FormSchema.update_field(node, :label, "Region", "node_1")
# Introspection
MyApp.FormSchema.node_types() # => [:dropdown, :text_field]
MyApp.FormSchema.fields(:dropdown) # => field definitions
```
Schema migrations:
```elixir
defmodule MyApp.FormSchemaV2 do
use Grove.Schema
version 2
node :dropdown do
field :label, :lww_register
field :required, :lww_register, default: false # new in v2
end
migrate 1 do
add_field :dropdown, :required, :lww_register, default: false
end
end
```
**Field CRDT types:**
| Type | Merge Behavior | Use Case |
|------|----------------|----------|
| `:lww_register` | Last writer wins | Simple values (default) |
| `:mv_register` | Keep all concurrent values | Expose conflicts to the user |
| `:or_set` | Add-wins set | Tags, categories |
| `:rga_list` | Interleave concurrent inserts | Ordered options |
| `:counter` | Sum of increments | Counts, votes |
## Phoenix LiveView Integration
Grove provides first-class LiveView integration: `mount_tree/3` subscribes the socket to a document topic, `apply_and_broadcast/2` applies a local change and broadcasts the delta, and `apply_remote/2` applies an incoming delta.
```elixir
defmodule MyAppWeb.EditorLive do
use MyAppWeb, :live_view
def mount(%{"id" => doc_id}, _session, socket) do
socket =
Grove.LiveView.mount_tree(socket, doc_id,
replica_id: socket.id,
pubsub: MyApp.PubSub
)
{:ok, socket}
end
def handle_event("update_field", %{"node_id" => node_id, "value" => value}, socket) do
socket =
Grove.LiveView.apply_and_broadcast(socket, fn tree ->
updated =
Grove.Tree.update_node(tree, node_id, fn node ->
Grove.Node.update_attrs(node, %{value: value})
end)
delta = {:update_attrs, node_id, %{value: value}}
{updated, delta}
end)
{:noreply, socket}
end
# Incoming changes from other replicas
def handle_info({:grove_delta, delta, from_pid}, socket) when from_pid != self() do
{:noreply, Grove.LiveView.apply_remote(socket, delta)}
end
end
```
The current tree is available with `Grove.LiveView.get_tree/1`, and local-only updates can be made with `Grove.LiveView.update_tree/2`.
### Presence Tracking
When the optional `phoenix_pubsub`/`phoenix_live_view` dependencies are present, Grove tracks per-replica cursors, selections, and focus (backed by `Phoenix.Tracker`). The helpers operate on the socket:
```elixir
socket = Grove.set_cursor(socket, "node_123", 4)
socket = Grove.set_selection(socket, "node_123", "node_124")
socket = Grove.set_focus(socket, "node_123")
Grove.get_presences(socket) # all replicas' presence (with auto-assigned colors)
Grove.get_my_presence(socket) # this replica's presence
```
### Session Management
`Grove.Session` runs a document as a supervised process that owns the tree, fans out changes to subscribers, and persists periodically:
```elixir
{:ok, session} = Grove.Session.get_or_start("doc_123", replica_id: "node_1")
Grove.Session.join(session, self())
# Apply a mutation to the shared tree
Grove.Session.mutate(session, fn tree ->
Grove.Tree.put_node(tree, Grove.Node.new("n1", :text))
end)
Grove.Session.get_state(session) # current Grove.Tree
Grove.Session.info(session) # %{replicas: ..., last_activity: ..., ...}
Grove.Session.leave(session, self())
```
Session lifecycle:
- **Grace period** — the session stays alive for a short window after the last replica leaves
- **Periodic persistence** — auto-saves on a timer when the document is dirty
## Distributed Erlang Integration
Grove starts a `:pg`-based membership process (`Grove.Cluster.Membership`) in its supervision tree, and an optional `Grove.Replication.Server` provides periodic delta sync between replicas over `:pg` or Phoenix PubSub. This lets Grove cluster natively across BEAM nodes without any external coordination service.
## Persistence
Grove ships pluggable storage adapters and a document-storage strategy that combines snapshots with an event log:
```elixir
# Snapshot the current tree and restore it later.
# create_snapshot/1 succeeds at a "critical" version, returning {:ok, snapshot}.
{:ok, snapshot} = Grove.Tree.create_snapshot(tree)
tree = Grove.Tree.from_snapshot(snapshot, "node_1")
```
- `Grove.Storage.ETS` — in-memory storage (read/write concurrency)
- `Grove.Storage.DETS` — disk-backed storage with auto-save and recovery
- `Grove.Tree.DocumentStorage` — snapshot + event-log persistence with `save/2`, `load/2`, and `compact/2`
Serialization uses a compact columnar encoding (RLE + zlib) for tree events (`Grove.Tree.EventSerializer`).
## Performance
Grove is optimized for real-time collaborative editing:
| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(1) amortized | O(1) |
| Move | O(log n) | O(1) |
| Apply Op | O(1) amortized | O(1) |
| Find Node | O(1) | — |
| Traverse | O(n) | O(depth) |
### Benchmarks
```
Date: 2026-06-14
Elixir: 1.20.1, OTP: 29
Scenario ips Average Memory
Sequential (fast-path) 918 1.09 ms 1.36 MB
Collaborative (mixed) 781 1.28 ms 1.68 MB
High-concurrency (slow-path) 834 1.20 ms 1.51 MB
Local Operations ips Average
get_node 9.59 M 104 ns
put_node 2.08 M 481 ns
update_node 1.11 M 900 ns
```
> **Note**: The collaborative and high-concurrency scenarios generate their workload with an unseeded `Enum.shuffle`, so those rows are not reproducible run-to-run — treat them as indicative, not exact. The fast-path/local/scalability figures are deterministic.
**Key insight**: For sequential editing (the common case), Eg-walker applies operations with near-zero CRDT overhead. The fast-path/slow-path gap grows with the degree of genuine concurrency in the workload.
See the [benchmarks guide](guides/benchmarks.md) for detailed analysis.
### Eg-walker Optimization
Grove implements the [Eg-walker algorithm](https://arxiv.org/abs/2409.14252) (EuroSys 2025 Best Paper) which optimizes for the common case: sequential editing.
- **Fast-path**: when edits are sequential (one user or turn-taking), operations apply directly with near-zero CRDT overhead
- **Slow-path**: when true concurrency occurs, a full CRDT merge ensures correctness
Most collaborative editing is sequential, so this optimization provides significant real-world performance benefits.
## Testing
`Grove.Testing` provides utilities for verifying CRDT convergence:
```elixir
defmodule MyApp.CrdtTest do
use ExUnit.Case
import Grove.Testing
test "concurrent edits converge" do
# Create N isolated replicas of a CRDT module
[a, b] = create_replicas(Grove.Set.ORSet, 2)
# Make concurrent edits on each replica
a = Grove.Set.ORSet.add(a, "x")
b = Grove.Set.ORSet.add(b, "y")
# Sync all replicas and assert convergence
[a, b] = sync_all([a, b])
assert_convergent([a, b])
end
end
```
| Function | Description |
|----------|-------------|
| `create_replicas/3` | Create N isolated replicas of a CRDT module |
| `sync/2` | Sync two replicas bidirectionally |
| `sync_all/1` | Sync all replicas in a list |
| `assert_convergent/1` | Assert all replicas have the same value |
| `replica_values/1` | Materialize each replica's value |
| `mutate_replica/3` | Apply a function to one replica by index |
| `random_operations/3` | Generate random operations for fuzzing |
## Roadmap
The following are planned but **not yet implemented**. See [guides/roadmap.md](guides/roadmap.md) for the full list.
- A public node-delete operation and **tombstone garbage collection** for tree nodes (today, structural GC exists only for OR-Set/OR-Map metadata via `Grove.GC.Structural`)
- **Selective undo** of a specific past operation (only stack-based undo/redo exists today)
- **Elixir AST conversion** helpers (`from_elixir_ast` / `to_elixir_ast`)
- **JSON serialization** and an **Ecto type** for storing trees
- **Gossip / anti-entropy** replication and version-vector full-state sync for new replicas
## Research Background
Grove is based on peer-reviewed research in distributed systems and CRDTs:
### Core Papers
- [A highly-available move operation for replicated trees](https://martin.kleppmann.com/papers/move-op.pdf) — Kleppmann et al., 2021
- [Eg-walker: Better, Faster, Smaller](https://arxiv.org/abs/2409.14252) — Gentle & Kleppmann, EuroSys 2025 (Best Paper)
- [COAST: A Conflict-free Replicated Abstract Syntax Tree](http://soft.vub.ac.be/Publications/2022/vub-tr-soft-22-17.pdf) — Munsters et al., 2022
- [A coordination-free, convergent, and safe replicated tree](https://arxiv.org/abs/2103.04828) — Nair et al., 2021
### Foundational CRDT Research
- [A comprehensive study of Convergent and Commutative Replicated Data Types](http://hal.inria.fr/inria-00555588/) — Shapiro et al., 2011
- [Pure Operation-Based Replicated Data Types](https://arxiv.org/abs/1710.04469) — Baquero et al., 2017
- [Interleaving Anomalies in Collaborative Text Editors](https://martin.kleppmann.com/papers/interleaving-papoc19.pdf) — Kleppmann et al., 2019
### Undo/Redo
- [A Generic Undo Support for State-Based CRDTs](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) — Yu et al., 2019
- [Undo and Redo Support for Replicated Registers](https://arxiv.org/abs/2404.11308) — Stewen & Kleppmann, 2024
See the [architecture guide](guides/architecture.md) for implementation details.
## Comparison with Other Libraries
| Feature | Grove | Automerge | Yjs | Loro |
|---------|-------|-----------|-----|------|
| Language | Elixir | Rust/JS | JS | Rust/JS |
| Tree CRDT | ✅ First-class | ✅ JSON | ✅ XML | ✅ Movable tree |
| Move operation | ✅ | ❌ (research only) | ❌ (removed) | ✅ |
| Eg-walker optimization | ✅ | ❌ | ❌ | ✅ (event-graph replay) |
| Undo/redo | ✅ local | ✅ local | ✅ | ✅ |
| Rich text | ❌ | ✅ | ✅ | ✅ |
| BEAM native | ✅ | ❌ | ❌ | ❌ |
| Phoenix integration | ✅ | ❌ | ❌ | ❌ |
## Contributing
Contributions are welcome!
```bash
# Clone the repository
git clone https://gitlab.com/tristanperalta/grove.git
cd grove
# Install dependencies
mix deps.get
# Run tests
mix test
# Run benchmarks
mix run benchmarks/eg_walker_bench.exs
```
## License
Grove is released under the MIT License. See [LICENSE](LICENSE) for details.