README.md

# Grove

**Conflict-free Replicated AST (Abstract Syntax Tree) data types for Elixir.**

Grove is a specialized CRDT library for building collaborative code editors, structured document editors, and any application requiring real-time synchronization of tree-structured data across distributed nodes.

## Features

- **Operation-based Tree CRDTs** — Efficient synchronization with minimal bandwidth
- **Eg-walker Optimization** — 10x faster for sequential editing (the common case)
- **Move Operations** — Refactor and reorganize AST nodes without creating cycles
- **Undo/Redo Support** — First-class support for editor integration
- **Phoenix LiveView Integration** — Real-time collaboration in web applications
- **Distributed Erlang Support** — Native clustering with BEAM distribution
- **Tombstone Garbage Collection** — Automatic cleanup of deleted nodes
- **Plain Data Conversion** — Store JSON in your database, use CRDTs only for sync

## Architecture Overview

```mermaid
flowchart TB
    subgraph Database
        JSON[(Plain JSON/AST)]
    end

    subgraph "Grove Session"
        SM[Session Manager]
        Tree[CRDT Tree]
    end

    subgraph "LiveView Replicas"
        LV1[LiveView A]
        LV2[LiveView B]
        LV3[LiveView C]
    end

    JSON -->|from_data| Tree
    Tree -->|to_data| JSON

    LV1 <-->|ops| SM
    LV2 <-->|ops| SM
    LV3 <-->|ops| SM

    SM --- Tree
```

## Installation

Add `grove` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:grove, "~> 0.1.0"}
  ]
end
```

## Quick Start

```elixir
# Create a new AST CRDT with a replica ID
tree = Grove.new(replica_id: "node_1")

# Insert nodes
tree = tree
  |> Grove.insert_node(:root, :function, %{name: "hello", arity: 1})
  |> Grove.insert_node("node_1_1", :param, %{name: "name"}, parent: :root)
  |> Grove.insert_node("node_1_2", :body, %{}, parent: :root)

# Move a node (safe, prevents cycles)
tree = Grove.move_node(tree, "node_1_1", new_parent: "node_1_2")

# Generate operations for sync
{tree, ops} = Grove.flush_operations(tree)

# On another replica, apply received operations
other_tree = Grove.apply_operations(other_tree, ops)

# Both trees converge to the same state
Grove.to_ast(tree) == Grove.to_ast(other_tree)  # => true
```

## Core Concepts

### Plain Data Conversion

Grove separates CRDT state from application data. Store plain JSON/maps in your database and only use Grove for real-time sync:

```elixir
# Load plain data from database, convert to CRDT tree
plain_ast = MyRepo.get!(Document, id).data
tree = Grove.from_data(plain_ast, replica_id: socket.id)

# After editing, convert back to plain data for storage
plain_ast = Grove.to_data(tree)
MyRepo.update!(document, %{data: plain_ast})

# Round-trip guarantee
assert Grove.to_data(Grove.from_data(data, replica_id: "a")) == data
```

### Replica IDs

Every node in a Grove cluster needs a unique replica ID. This is used to generate globally unique node identifiers and resolve concurrent operations deterministically.

```elixir
tree = Grove.new(replica_id: System.get_env("NODE_ID") || UUID.uuid4())
```

### Node Structure

Each node in the tree has:

| Field | Description |
|-------|-------------|
| `id` | Unique identifier (replica_id + lamport_timestamp) |
| `type` | AST node type (atom) |
| `value` | Node content (map) — uses LWW-Register semantics |
| `parent_id` | Parent node reference |
| `children` | Ordered list of child node IDs |
| `deleted?` | Tombstone flag for deleted nodes |

### Operations

Grove uses operation-based CRDTs where each modification generates an operation that can be sent to other replicas:

```elixir
# Operations are automatically generated
{tree, ops} = Grove.flush_operations(tree)

# ops might contain:
[
  %Grove.Op.Insert{id: "node_1_3", type: :expr, value: %{}, parent: "node_1_2", position: 0},
  %Grove.Op.Move{id: "node_1_1", new_parent: "node_1_2", position: 1},
  %Grove.Op.Update{id: "node_1_2", value: %{line: 5}}
]
```

### Batch Operations

Group multiple operations into a single atomic unit for undo and broadcast:

```elixir
# Adding a composite field (address) as one undoable action
tree = Grove.batch(tree, fn t ->
  t
  |> Grove.insert_node("addr", :group, %{label: "Address"}, parent: "form")
  |> Grove.insert_node("street", :text, %{label: "Street"}, parent: "addr")
  |> Grove.insert_node("city", :text, %{label: "City"}, parent: "addr")
  |> Grove.insert_node("zip", :text, %{label: "ZIP"}, parent: "addr")
end)

# Single undo reverts all 4 inserts
{tree, ops} = Grove.undo(tree)
```

### Operation Metadata

Attach context to operations for audit trails and version history:

```elixir
tree = Grove.update_node(tree, node_id, %{label: "New Label"},
  meta: %{
    user_id: current_user.id,
    user_name: current_user.name,
    timestamp: DateTime.utc_now()
  }
)

# Get operation history with metadata
Grove.history(tree)
|> Enum.map(fn op -> "#{op.meta.user_name} edited #{op.node_id}" end)
```

## Query API

Find and traverse nodes in the tree:

```elixir
# Find by ID
Grove.get_node(tree, "field_123")

# Find by type
Grove.find_nodes(tree, type: :dropdown)

# Find by attribute
Grove.find_nodes(tree, where: [required: true])

# Get path to node (for breadcrumbs)
Grove.path_to(tree, "field_123")
# => ["form_root", "step_1", "field_123"]

# Traversal
Grove.parent(tree, "field_123")
Grove.children(tree, "field_123")
Grove.siblings(tree, "field_123")
Grove.ancestors(tree, "field_123")
Grove.descendants(tree, "field_123")
```

## Move Operation Semantics

The move operation is the most complex part of tree CRDTs. Grove implements the algorithm from ["A highly-available move operation for replicated trees"](https://martin.kleppmann.com/papers/move-op.pdf) (Kleppmann et al., 2021) to handle concurrent moves safely.

### Cycle Prevention

When two replicas concurrently move nodes in conflicting ways, Grove prevents cycles:

```mermaid
graph LR
    subgraph "Before"
        A1[A] --> B1[B] --> C1[C]
    end

    subgraph "Concurrent Moves"
        R1[Replica 1: move B under C]
        R2[Replica 2: move C under B]
    end

    subgraph "After Sync"
        A2[A] --> B2[B] --> C2[C]
    end

    style R1 fill:#ffcccc
    style R2 fill:#ccccff
```

```elixir
# Replica A: move X under Y
tree_a = Grove.move_node(tree_a, "X", new_parent: "Y")

# Replica B: move Y under X (concurrent)
tree_b = Grove.move_node(tree_b, "Y", new_parent: "X")

# After sync, one move "wins" deterministically based on Lamport timestamps
# No cycles are ever created
```

### Move vs Delete Conflicts

When one replica moves a node while another deletes its destination:

```elixir
# Replica A: move X under Y
# Replica B: delete Y (concurrent)

# Result: X is moved to Y's former parent (or root if Y was a root child)
```

## Undo/Redo

Grove provides undo/redo support based on ["A Generic Undo Support for State-Based CRDTs"](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) (Yu et al., 2019).

```elixir
# Enable undo tracking
tree = Grove.new(replica_id: "node_1", undo: true)

# Make some changes
tree = tree
  |> Grove.insert_node("n1", :expr, %{value: 1}, parent: :root)
  |> Grove.insert_node("n2", :expr, %{value: 2}, parent: :root)
  |> Grove.delete_node("n1")

# Undo the last operation
{tree, ops} = Grove.undo(tree)

# "n1" is restored, and ops are generated to sync the undo

# Redo
{tree, ops} = Grove.redo(tree)
```

### Selective Undo

Undo a specific operation rather than just the most recent:

```elixir
# Get operation history
history = Grove.history(tree)

# Undo a specific operation by ID
{tree, ops} = Grove.undo(tree, operation_id: "op_12345")
```

## Phoenix LiveView Integration

Grove provides first-class LiveView integration with automatic subscription and operation handling.

```mermaid
sequenceDiagram
    participant User A
    participant LiveView A
    participant Session Manager
    participant LiveView B
    participant User B

    User A->>LiveView A: Edit node
    LiveView A->>Session Manager: apply_op(doc_id, op)
    Session Manager->>LiveView B: {:grove_ops, ops}
    LiveView B->>User B: Re-render
    Session Manager->>LiveView A: {:grove_ops, ops}
    LiveView A->>User A: Re-render
```

### Using Grove.LiveView Hooks

```elixir
defmodule MyAppWeb.EditorLive do
  use MyAppWeb, :live_view
  on_mount {Grove.LiveView, :subscribe}

  def mount(%{"id" => id}, _session, socket) do
    # Load plain data and convert to CRDT tree
    data = MyRepo.get!(Document, id).data
    tree = Grove.from_data(data, replica_id: socket.id)

    {:ok, assign(socket, doc_id: id, tree: tree)}
  end

  def handle_event("insert_node", params, socket) do
    tree = Grove.insert_node(
      socket.assigns.tree,
      params["id"],
      String.to_atom(params["type"]),
      params["value"],
      parent: params["parent_id"]
    )

    # Flush and broadcast
    {tree, ops} = Grove.flush_operations(tree)
    Grove.Session.apply_op(socket.assigns.doc_id, ops)

    {:noreply, assign(socket, tree: tree)}
  end

  def handle_info({:grove_ops, ops}, socket) do
    tree = Grove.apply_operations(socket.assigns.tree, ops)
    {:noreply, assign(socket, tree: tree)}
  end
end
```

### Session Management

Grove manages document sessions with automatic lifecycle handling:

```elixir
# Sessions are started automatically when first replica joins
Grove.Session.join(doc_id, self())

# Get current tree state
tree = Grove.Session.get_state(doc_id)

# Get session info
Grove.Session.info(doc_id)
# => %{
#   replicas: 3,
#   last_activity: ~U[2024-01-15 10:30:00Z],
#   pending_ops: 0
# }

# Leave session (automatic on process exit)
Grove.Session.leave(doc_id, self())
```

Session lifecycle:
- **Grace period**: Sessions stay alive for 5 minutes after last replica leaves
- **Periodic persistence**: Auto-saves every 30 seconds when dirty
- **Operation buffering**: Debounces rapid updates

### Presence Tracking

Built-in cursor and selection tracking for collaborative UIs:

```elixir
# Track what each user is focused on
tree = Grove.set_focus(tree, "node_123")
tree = Grove.set_selection(tree, ["node_123", "node_124"])

# Get all replica presence (with auto-assigned colors)
Grove.get_presence(tree)
# => %{
#   "replica_a" => %{focus: "node_1", selection: [], user: "Alice", color: "#4CAF50"},
#   "replica_b" => %{focus: "node_3", selection: ["node_3", "node_4"], user: "Bob", color: "#2196F3"}
# }
```

```mermaid
graph TD
    subgraph "Collaborative View"
        A[Form Root]
        B[Step 1 - Alice editing]
        C[Step 2 - Bob editing]
        D[Field 1]
        E[Field 2 - Selected by Bob]
    end

    A --> B
    A --> C
    B --> D
    C --> E

    style B fill:#4CAF50,color:white
    style C fill:#2196F3,color:white
    style E stroke:#2196F3,stroke-width:3px
```

### LiveView Hooks for Editor Integration

```javascript
// assets/js/hooks/grove_editor.js
export const GroveEditor = {
  mounted() {
    this.editor = createEditor(this.el);

    this.editor.on("change", (delta) => {
      this.pushEvent("editor_change", delta);
    });

    this.handleEvent("apply_ops", (ops) => {
      this.editor.applyOps(ops);
    });
  }
}
```

## Distributed Erlang Integration

Grove works natively with Erlang distribution for clustering:

```elixir
defmodule MyApp.DocumentCluster do
  use GenServer

  def start_link(document_id) do
    GenServer.start_link(__MODULE__, document_id, name: via(document_id))
  end

  def init(document_id) do
    # Join the cluster for this document
    :pg.join(:grove_documents, document_id, self())

    tree = Grove.new(replica_id: node_id())
    {:ok, %{document_id: document_id, tree: tree}}
  end

  def handle_cast({:ops, ops, from_node}, state) do
    tree = Grove.apply_operations(state.tree, ops)
    {:noreply, %{state | tree: tree}}
  end

  def handle_call({:mutate, fun}, _from, state) do
    tree = fun.(state.tree)
    {tree, ops} = Grove.flush_operations(tree)

    # Broadcast to all replicas
    broadcast_ops(state.document_id, ops)

    {:reply, :ok, %{state | tree: tree}}
  end

  defp broadcast_ops(document_id, ops) do
    for pid <- :pg.get_members(:grove_documents, document_id), pid != self() do
      GenServer.cast(pid, {:ops, ops, Node.self()})
    end
  end
end
```

## AST Conversion

Convert between Grove trees and Elixir AST:

```elixir
# Parse Elixir code into a Grove tree
{:ok, ast} = Code.string_to_quoted("def hello(name), do: name")
tree = Grove.from_elixir_ast(ast, replica_id: "node_1")

# Convert back to Elixir AST
elixir_ast = Grove.to_elixir_ast(tree)

# Pretty print
Grove.to_string(tree)
# => "def hello(name), do: name"
```

### Custom AST Schemas

Define custom node types with field-level CRDT semantics:

```elixir
defmodule MyApp.FormSchema do
  use Grove.Schema

  node :form do
    field :title, :lww_register        # Last-writer-wins (default)
    field :description, :lww_register
    children :steps
  end

  node :dropdown do
    field :label, :lww_register
    field :options, :rga_list          # Ordered list - concurrent inserts merge
    field :tags, :or_set               # Set - adds/removes merge
  end

  node :text_field do
    field :label, :lww_register
    field :placeholder, :lww_register
    field :validators, :or_set         # Set of validation rules
  end
end

# Use with Grove
tree = Grove.new(replica_id: "node_1", schema: MyApp.FormSchema)
```

**Field CRDT Types:**

| Type | Merge Behavior | Use Case |
|------|----------------|----------|
| `:lww_register` | Last writer wins | Simple values (default) |
| `:mv_register` | Keep all concurrent values | Expose conflicts to user |
| `:or_set` | Union of adds, intersection of removes | Tags, categories |
| `:rga_list` | Interleave concurrent inserts | Ordered options |

## Performance

Grove is optimized for real-time collaborative editing:

| Operation | Time Complexity | Space Complexity |
|-----------|-----------------|------------------|
| Insert | O(1) amortized | O(1) |
| Delete | O(1) | O(1) (tombstone) |
| Move | O(log n) | O(1) |
| Apply Op | O(1) amortized | O(1) |
| Find Node | O(1) | — |
| Traverse | O(n) | O(depth) |

### Benchmarks

```
Date: 2026-01-03
Elixir: 1.18.4, OTP: 25

Scenario                          ips        Average    Memory
Sequential (fast-path)            617        1.62 ms    1.43 MB
Collaborative (mixed)             144        6.94 ms    5.82 MB
High-concurrency (slow-path)       63       15.76 ms   21.84 MB

Local Operations                  ips        Average
get_node                        5.57 M       179 ns
put_node                        1.13 M       888 ns
update_node                     0.65 M      1537 ns
```

**Key insight**: Eg-walker's fast-path is 10x faster than slow-path for the common case (sequential editing).

See [benchmarks guide](guides/benchmarks.md) for detailed analysis.

### Eg-walker Optimization

Grove implements the [Eg-walker algorithm](https://arxiv.org/abs/2409.14252) (EuroSys 2025 Best Paper) which optimizes for the common case: sequential editing.

- **Fast-path**: When edits are sequential (one user or turn-taking), operations apply directly with near-zero CRDT overhead
- **Slow-path**: When true concurrency occurs, full CRDT merge ensures correctness

Most collaborative editing is sequential (~85%), so this optimization provides significant real-world performance benefits.

## Garbage Collection

Tombstones (markers for deleted nodes) are garbage collected when all replicas have observed the deletion:

```elixir
# Configure GC
tree = Grove.new(
  replica_id: "node_1",
  gc: [
    # Minimum age before tombstone can be collected
    min_tombstone_age: :timer.hours(24),
    # Run GC every N operations
    gc_interval: 1000
  ]
)

# Manual GC with known replica states
tree = Grove.gc(tree, observed_by: ["node_1", "node_2", "node_3"])
```

## Persistence

Grove trees can be serialized for storage:

```elixir
# Serialize to binary (efficient)
binary = Grove.serialize(tree)
tree = Grove.deserialize(binary)

# Serialize to JSON (interoperable)
json = Grove.to_json(tree)
tree = Grove.from_json(json)

# Ecto integration
defmodule MyApp.Document do
  use Ecto.Schema

  schema "documents" do
    field :tree, Grove.Ecto.Type
    timestamps()
  end
end
```

## Testing

Grove provides comprehensive testing utilities for verifying CRDT correctness:

```elixir
defmodule MyApp.EditorTest do
  use ExUnit.Case
  import Grove.Testing

  test "concurrent edits converge" do
    # Create isolated replicas from initial data
    [tree_a, tree_b] = create_replicas(initial_data, count: 2)

    # Simulate concurrent edits
    tree_a = Grove.insert_node(tree_a, "n1", :expr, %{v: 1}, parent: :root)
    tree_b = Grove.insert_node(tree_b, "n2", :expr, %{v: 2}, parent: :root)

    # Sync all replicas
    [tree_a, tree_b] = sync_all([tree_a, tree_b])

    # Verify convergence
    assert_convergent([tree_a, tree_b])
    assert_no_cycles(tree_a)
  end

  test "network partition and heal" do
    [a, b, c, d] = create_replicas(data, count: 4)

    # Simulate partition: {a, b} and {c, d} can't communicate
    {group1, group2} = partition([a, b, c, d], groups: [[0, 1], [2, 3]])

    # Make edits in each partition
    group1 = update_all(group1, fn t -> Grove.insert_node(t, "x", :text, %{}, parent: :root) end)
    group2 = update_all(group2, fn t -> Grove.insert_node(t, "y", :text, %{}, parent: :root) end)

    # Heal partition
    all_trees = heal(group1, group2)

    # All should converge
    assert_convergent(all_trees)
  end

  test "fuzz test with random operations" do
    [tree_a, tree_b, tree_c] = create_replicas(data, count: 3)

    # Generate and apply random operations
    trees = [tree_a, tree_b, tree_c]
    |> Enum.map(fn t -> apply_random_operations(t, count: 100) end)
    |> sync_all()

    assert_convergent(trees)
    Enum.each(trees, &assert_no_cycles/1)
  end
end
```

### Testing Utilities Reference

| Function | Description |
|----------|-------------|
| `create_replicas/2` | Create N isolated replicas from initial data |
| `sync/2` | Sync two replicas bidirectionally |
| `sync_all/1` | Sync all replicas in a list |
| `assert_convergent/1` | Assert all trees have same value |
| `assert_no_cycles/1` | Assert tree has no cycles |
| `partition/2` | Simulate network partition |
| `heal/2` | Heal partition, sync all |
| `random_operations/2` | Generate random ops for fuzzing |

## Research Background

Grove is based on peer-reviewed research in distributed systems and CRDTs:

### Core Papers

- [A highly-available move operation for replicated trees](https://martin.kleppmann.com/papers/move-op.pdf) — Kleppmann et al., 2021
- [Eg-walker: Better, Faster, Smaller](https://arxiv.org/abs/2409.14252) — Gentle & Kleppmann, EuroSys 2025 (Best Paper)
- [COAST: A Conflict-free Replicated Abstract Syntax Tree](http://soft.vub.ac.be/Publications/2022/vub-tr-soft-22-17.pdf) — Munsters et al., 2022
- [A coordination-free, convergent, and safe replicated tree](https://arxiv.org/abs/2103.04828) — Nair et al., 2021

### Foundational CRDT Research

- [A comprehensive study of Convergent and Commutative Replicated Data Types](http://hal.inria.fr/inria-00555588/) — Shapiro et al., 2011
- [Pure Operation-Based Replicated Data Types](https://arxiv.org/abs/1710.04469) — Baquero et al., 2017
- [Interleaving Anomalies in Collaborative Text Editors](https://martin.kleppmann.com/papers/interleaving-papoc19.pdf) — Kleppmann et al., 2019

### Undo/Redo

- [A Generic Undo Support for State-Based CRDTs](https://drops.dagstuhl.de/opus/volltexte/2020/11800/pdf/LIPIcs-OPODIS-2019-14.pdf) — Yu et al., 2019
- [Undo and Redo Support for Replicated Registers](https://arxiv.org/abs/2404.11308) — Stewen & Kleppmann, 2024

See the [architecture guide](guides/architecture.md) for implementation details.

## Comparison with Other Libraries

| Feature | Grove | Automerge | Yjs | Riak DT |
|---------|-------|-----------|-----|---------|
| Language | Elixir | Rust/JS | JS | Erlang |
| Tree CRDTs | ✅ First-class | ✅ JSON | ✅ XML | ❌ |
| Move operation | ✅ | ✅ | ✅ | ❌ |
| Eg-walker optimization | ✅ | ❌ | ❌ | ❌ |
| Undo/Redo | ✅ | ✅ | ✅ | ❌ |
| BEAM native | ✅ | ❌ | ❌ | ✅ |
| Phoenix integration | ✅ | ❌ | ❌ | ❌ |

## Contributing

Contributions are welcome!

```bash
# Clone the repository
git clone https://gitlab.com/tristanperalta/grove.git
cd grove

# Install dependencies
mix deps.get

# Run tests
mix test

# Run benchmarks
mix bench
```

## License

Grove is released under the MIT License. See [LICENSE](LICENSE) for details.

---

Built with ❤️ for the Elixir community.