# Grove Roadmap
Implementation plan for the Grove CRDT library, organized into phases based on dependencies and complexity.
---
## Phase 1: Foundation
**Goal**: Establish core abstractions and implement basic CRDT types.
### 1.1 Core Abstractions
- [x] Define `Grove.CRDT` behaviour with callbacks:
- `new/1`, `merge/2`, `value/1`
- Optional: `apply_operation/2`, `delta/1`, `reset_delta/1`
- [x] Define protocols:
- `Grove.Mergeable` for polymorphic merge
- `Grove.Viewable` for extracting values
- [x] Implement `Grove.VectorClock`:
- `new/0`, `increment/2`, `merge/2`
- `compare/2` returning `:before | :after | :equal | :concurrent`
- `descends?/2` for causal ordering
### 1.2 Counter CRDTs
- [x] `Grove.GCounter` (Grow-only Counter)
- State: `%{actor => count}`
- Operations: `increment/1`, `increment/2` (with amount)
- Merge: pointwise maximum
- Value: sum of all counts
- [x] `Grove.PNCounter` (Positive-Negative Counter)
- State: `%{positive: GCounter, negative: GCounter}`
- Operations: `increment/1`, `decrement/1`
- Value: positive.value - negative.value
### 1.3 Register CRDTs
- [x] `Grove.LWWRegister` (Last-Write-Wins Register)
- State: `%{value: term, timestamp: Lamport.t, actor: actor}`
- Use Lamport timestamps (NOT wall-clock)
- Tiebreaker: actor ID comparison
- [x] `Grove.MVRegister` (Multi-Value Register)
- State: `%{values: %{{value, vclock} => true}}`
- Preserve all concurrent values
- `value/1` returns list of concurrent values
### 1.4 Basic Tests
- [x] Unit tests for each CRDT type
- [x] Property-based tests for CRDT laws:
- Commutativity: `merge(a, b) == merge(b, a)`
- Associativity: `merge(merge(a, b), c) == merge(a, merge(b, c))`
- Idempotence: `merge(a, a) == a`
---
## Phase 2: Sets and Maps
**Goal**: Implement set CRDTs with proper conflict resolution.
### 2.1 Set CRDTs
- [x] `Grove.GSet` (Grow-only Set)
- State: `MapSet.t`
- Operations: `add/2`
- Merge: union
- [x] `Grove.TwoPSet` (Two-Phase Set)
- State: `%{added: GSet, removed: GSet}`
- Operations: `add/2`, `remove/2`
- Constraint: cannot re-add after remove
- Value: added - removed
- [x] `Grove.ORSet` (Observed-Remove Set / AWSet)
- State: `%{entries: %{elem => MapSet.t(tag)}}`
- Tag: `{actor, counter}` - unique per add operation
- Add-wins semantics (concurrent add + remove → element present)
- Operations: `add/2`, `remove/2`, `member?/2`
- Merge: union of tags per element
### 2.2 Map CRDT
- [x] `Grove.ORMap` (Observed-Remove Map)
- State: `%{keys: ORSet, values: %{key => CRDT}}`
- Keys managed by OR-Set semantics
- Values can be any CRDT (nested composition)
- Operations: `put/3`, `delete/2`, `get/2`, `update/3`
- Merge: merge keys via OR-Set, recursively merge values via `Grove.Mergeable`
### 2.3 Composition Support
- [x] Implement recursive merge for nested CRDTs
- [x] Type specification for valid CRDT values in maps
- [ ] Helper macros for defining composite CRDTs
### 2.4 Tests
- [x] Property tests for set convergence
- [x] Concurrent add/remove scenarios
- [x] Nested CRDT merge correctness
---
## Phase 3: Delta-State CRDTs
**Goal**: Implement bandwidth-efficient delta synchronization.
### 3.1 Delta Infrastructure
- [x] Extend `Grove.CRDT` behaviour with delta callbacks:
- `delta/1` - extract accumulated delta
- `reset_delta/1` - clear delta after sync
- ~~`merge_delta/2`~~ - not needed; regular `merge/2` works for deltas
- [ ] Implement `Grove.DeltaCRDT` wrapper:
```elixir
defstruct [:full_state, :delta, :version]
```
### 3.2 Delta Implementations
- [x] `Grove.Counter.GCounter` - delta_buffer accumulates local increments
- [x] `Grove.Counter.PNCounter` - delegates to underlying GCounter deltas
- [x] `Grove.Set.GSet` - delta_buffer accumulates added elements
- [x] `Grove.Set.TwoPSet` - delegates to underlying GSet deltas
- [x] `Grove.Register.LWWRegister` - delta_dirty flag, delta = full state
- [x] `Grove.Register.MVRegister` - delta_buffer stores last written value
- [x] `Grove.Set.ORSet` - delta_buffer accumulates added tags
- [x] `Grove.Map.ORMap` - delta_buffer accumulates key/value changes
### 3.3 Delta Serialization
- [ ] Implement `Grove.Serialization.Delta`:
- `encode/2` - encode delta since version
- `decode/1` - decode delta
- Compression for wire transmission
### 3.4 Tests
- [x] Verify delta merge produces same result as full-state merge
- [ ] Bandwidth measurement tests
- [x] Delta accumulation across multiple operations
---
## Phase 4: Replication Layer
**Goal**: Provide GenServer-based managed replicas with automatic sync.
### 4.1 Replication Server
- [x] `Grove.Replication.Server` GenServer:
- Manages single CRDT instance
- Periodic sync via configurable interval
- Broadcast delta to all peers
- Merge incoming deltas automatically
### 4.2 Cluster Membership
> **Note**: [libcluster](https://hexdocs.pm/libcluster/readme.html) handles **node-level discovery**
> (connecting Erlang nodes via strategies like Kubernetes, DNS, multicast).
> Grove should use libcluster as an optional dependency for node discovery,
> then layer `:pg` on top for replica-level process group management.
- [x] `Grove.Cluster.Membership`:
- Integration with `:pg` (process groups) for replica tracking
- Peer discovery (join/leave/peers)
- Broadcast and random peer selection
- Auto-starts in application supervision tree
- [ ] Optional libcluster integration for node discovery
- [ ] Node up/down handling and partition detection
### 4.3 Sync Protocols
- [x] Broadcast protocol (send delta to all peers)
- [ ] `Grove.Replication.Gossip`:
- Random peer selection with fanout
- Anti-entropy protocol
- [ ] `Grove.Replication.Sync`:
- Full state sync for new replicas
- Version vector exchange
- Demand-based pull sync
### 4.4 Tests
- [x] Multi-process convergence tests
- [x] Concurrent update tests
- [ ] Multi-node tests using `:peer`
- [ ] Partition and heal scenarios
---
## Phase 5: Storage and Persistence
**Goal**: Provide durable storage backends.
### 5.1 Storage Behaviour
- [x] Define `Grove.Storage` behaviour:
- `init/1`, `save/3`, `load/2`, `delete/2`, `list_groups/1`, `close/1`
- [x] Implement `Grove.Storage.Serializer`:
- `term_to_binary` with compression
- Safe decoding with `:safe` option
### 5.2 ETS Backend
- [x] `Grove.Storage.ETS`:
- In-memory storage with read/write concurrency
- Named tables with `:public` access
- Full CRUD operations
- [ ] Delta storage with pruning (future)
- [ ] Materialized view support for fast reads (future)
### 5.3 DETS Backend
- [x] `Grove.Storage.DETS`:
- Disk-based persistence
- Auto-save at configurable intervals
- Recovery on restart
### 5.4 Server Integration
- [x] Integrate storage into `Grove.Replication.Server`:
- State restoration on startup via merge
- Periodic snapshot timer (configurable)
- Save on terminate for graceful shutdown
### 5.5 Tests
- [x] Persistence and recovery tests
- [x] Concurrent access tests
- [x] Integration tests with Replication.Server
- [ ] Large dataset tests
---
## Phase 6: Garbage Collection
**Goal**: Prevent unbounded growth of tombstones and metadata.
### 6.1 GC Infrastructure
- [x] `Grove.GC` behaviour:
- `collect/2` - run GC on CRDT
- `needs_gc?/2` - check if GC is recommended
- `gc_info/1` - get GC metadata
### 6.2 GC Strategies
- [x] `Grove.GC.Structural`:
- Remove entries with empty tag sets (ORSet)
- Remove orphaned values (ORMap)
- Safe without replica coordination (based on "Causality to Stability" paper)
- [ ] `Grove.GC.Causal` (future):
- Track replica vector clocks via gossip
- Compute causal stability (minimum observed version)
- Only GC tombstones in stable past
- [ ] `Grove.GC.TimeBased` (future):
- Configurable retention period
- Requires timestamps in tags
### 6.3 Integration
- [ ] Integrate GC into `Grove.Replication.Server` (future)
- [ ] Configurable GC intervals
- [ ] GC metrics and monitoring
### 6.4 Tests
- [x] Verify GC doesn't affect convergence
- [x] ORSet and ORMap cleanup tests
- [ ] Memory usage tests over time
- [ ] Edge cases (offline replicas, clock skew)
---
## Phase 7: Advanced Causality
**Goal**: Optimize causality tracking for production workloads.
### 7.1 Dotted Version Vectors
- [x] `Grove.DottedVersionVector`:
- Dot (single event) + context (observed events)
- event/2, sync/2, merge/2, descends?/2, dominates?/2
- Precise causality for OR-Set semantics
### 7.2 Hybrid Logical Clocks
- [x] `Grove.HybridLogicalClock`:
- Combines physical time + logical counter
- tick/1, update/2, compare/2, merge/2
- Stays close to wall-clock, handles drift
### 7.3 Optimized OR-Set
- [x] `Grove.Set.DVVSet`:
- O(replicas) metadata instead of O(operations)
- One dot per actor per element
- Same add-wins semantics as ORSet
- [ ] Benchmark against basic implementation (future)
- [ ] Memory usage comparison (future)
---
## Phase 8: Developer Experience (P0 Features)
**Goal**: Essential APIs for practical usage.
### 8.1 Plain Data Conversion (Critical)
- [x] `Grove.from_data/2` - Convert plain data to CRDT tree
- Accept `replica_id` option
- Generate node IDs if not present
- Initialize vector clocks
- [x] `Grove.to_data/1` - Convert CRDT tree to plain data
- Strip all CRDT metadata (vclocks, tombstones, deltas)
- Preserve node IDs and structure
- Round-trip guarantee: `to_data(from_data(data)) == data`
### 8.2 Node Query API
- [x] `Grove.Tree.get_node/2` - Find node by ID
- [x] `Grove.Tree.find_nodes/2` - Find by type or attributes
- `find_nodes(tree, type: :dropdown)`
- `find_nodes(tree, where: [required: true])`
- [x] `Grove.Tree.path_to/2` - Get ancestor path to node
- [x] `Grove.Tree.parent/2` - Get parent node
- [x] `Grove.Tree.siblings/2` - Get sibling nodes
- [x] `Grove.Tree.children/2` - Get child nodes
### 8.3 Batch Operations
- [x] `Grove.Tree.batch/2` - Atomic multi-operation wrapper
```elixir
{tree, delta} = Grove.Tree.batch(tree, fn t ->
t
|> Grove.Tree.put_node(Node.new("step_2", :step))
|> Grove.Tree.put_node(Node.new("field_1", :text))
end)
# Integrates with Session.mutate/2
Session.mutate(pid, fn tree ->
Grove.Tree.batch(tree, fn t ->
t |> Grove.Tree.put_node(...) |> Grove.Tree.update_node(...)
end)
end)
```
- [x] Single compound operation for undo (batch pushed as one entry to `undo_stack`)
- [x] Single broadcast for all changes (returns `{tree, {:batch, ops}}` tuple)
- [x] `Grove.Tree.reset_pending_ops/1` - Clear pending ops after broadcast
- [x] Operation tracking in `put_node/2` and `update_node/3`
### 8.4 Operation Metadata
- [x] Extend operations with `meta` field via optional trailing element
```elixir
Grove.Tree.put_node(tree, node, meta: %{user_id: "u1", user_name: "Alice"})
Grove.Tree.update_node(tree, node_id, update_fn, meta: %{reason: "fix"})
Grove.Tree.batch(tree, fn t -> ... end, meta: %{source: "import"})
```
- [x] `Grove.Tree.history/2` - Get operation history with metadata
```elixir
Tree.history(tree)
Tree.history(tree, since: timestamp, node_id: "field_1", where: [user_id: "u1"])
```
- [x] `Grove.Tree.operation_meta/1` - Extract metadata from operations
- [x] `Grove.Session.history/2` - Get history via Session API
- [x] `Grove.Tree.HistoryEntry` struct for history entries
- [x] Metadata preserved but doesn't affect merge semantics
### 8.5 Move Operations
- [x] `Grove.Tree.move_node/4` - Move node to new parent with LWW semantics
```elixir
Tree.move_node(tree, "field_1", "step_2")
Tree.move_node(tree, "field_1", "step_2", meta: %{user_id: "u1"})
Tree.move_node(tree, "field_1", nil) # Move to root
```
- [x] Cycle prevention - Prevent moving a node into its own descendants
- [x] LWW conflict resolution using HLC timestamps
- [x] `Grove.Tree.apply_remote_move/2` - Apply remote move with conflict resolution
- [x] `parent_timestamp` field on Node for tracking LWW state
- [x] Silent skip of cycle-creating moves (Kleppmann's approach for convergence)
- [x] **Full Kleppmann undo-do-redo algorithm** for guaranteed convergence:
- Operation log (`operation_log`) stores all moves in descending timestamp order
- `Grove.Tree.LogMove` module for internal operation tracking with `old_parent`
- `Grove.Tree.Move` module for move operation representation
- Undo-do-redo: when new operation arrives, undo later ops, apply new, redo
- Validity recalculation during redo (handles cycles, deleted nodes, LWW)
- Property-based tests for convergence, acyclicity, unique parents
- Concurrent scenario tests (parent-child swap, three-way cycles)
### 8.6 Undo/Redo Operations
- [x] `Grove.Tree.undo/1` - Undo last local operation
```elixir
{:ok, tree} = Tree.undo(tree)
{:error, :empty_stack} = Tree.undo(empty_tree)
```
- [x] `Grove.Tree.redo/1` - Redo last undone operation
- [x] `Grove.Tree.can_undo?/1` - Check if undo is available
- [x] `Grove.Tree.can_redo?/1` - Check if redo is available
- [x] Local undo semantics (each replica has its own undo stack)
- [x] Inverse operation generation for all operation types
- [x] New operations clear redo stack
- [x] Batch operations undo as a single unit
- [x] Move operation format extended with `old_parent_id` for undo support
### 8.7 Testing Utilities
- [x] `Grove.Testing.create_replicas/2` - Create isolated replicas
- [x] `Grove.Testing.sync/2` - Sync two replicas bidirectionally
- [x] `Grove.Testing.sync_all/1` - Sync all replicas in list
- [x] `Grove.Testing.assert_convergent/1` - Assert all converged
- [x] `Grove.Testing.random_operations/2` - Generate random ops for fuzzing
- [x] `Grove.Testing.apply_operations/2` - Apply operation list to CRDT
- [ ] Grove.Testing.partition/2 - Simulate network partition (future)
- [ ] Grove.Testing.heal/1 - Heal partition (future)
---
## Phase 9: Phoenix Integration
**Goal**: First-class Phoenix and LiveView integration.
### 9.1 LiveView Integration
- [x] `Grove.LiveView` module:
- `mount_tree/3` - Mount tree into socket, subscribe to PubSub
- `apply_and_broadcast/2` - Apply local op and broadcast delta
- `apply_remote/3` - Apply remote delta from PubSub
- `get_tree/1`, `get_replica_id/1`, `get_doc_id/1` - Accessors
- `update_tree/2` - Local-only updates without broadcast
```elixir
defmodule MyAppWeb.EditorLive do
use Phoenix.LiveView
def mount(%{"id" => id}, _session, socket) do
socket = Grove.LiveView.mount_tree(socket, id,
replica_id: socket.id,
pubsub: MyApp.PubSub
)
{:ok, socket}
end
def handle_info({:grove_delta, delta, from_pid}, socket) when from_pid != self() do
socket = Grove.LiveView.apply_remote(socket, delta, &apply_delta/2)
{:noreply, socket}
end
end
```
### 9.2 Session/Document Lifecycle
- [x] `Grove.Session` GenServer:
- `get_or_start/2` - Start or get existing session for document
- `join/2` - Join session, returns current tree state
- `leave/2` - Leave session
- `mutate/2` - Apply tree mutation and broadcast delta
- `get_state/1` - Get current tree state
- `info/1` - Get session info (subscriber count, tree size)
- [x] Registry + DynamicSupervisor for session management
- [x] Grace period before termination (configurable)
- [x] Periodic persistence (configurable interval)
- [x] Subscriber crash detection via Process.link
### 9.3 Presence Integration
- [x] `Grove.Presence` built on Phoenix.Tracker:
- Track cursor/focus position per replica
- Track selection state
- Automatic color assignment
- [x] `Grove.set_focus/2` - Track focus
- [x] `Grove.set_selection/4` - Track selection
- [x] `Grove.get_presences/1` - Get all replica presence
- [x] `Grove.LiveView.Presence` on_mount hook for auto-tracking
### 9.4 Phoenix.PubSub Integration
- [x] `Grove.PubSub` module with topic-based namespacing
- [x] Delta broadcast via PubSub (alternative to :pg)
- [x] Configurable PubSub server in `Replication.Server`
---
## Phase 10: Performance Features (P2)
**Goal**: Optimizations for production workloads.
### 10.1 Debounced Operations ✅
- [x] `Grove.buffer_update/3` - Buffer updates locally with immediate tree application
- [x] `Grove.flush_buffer/1` - Flush buffered deltas as `{:batch, [deltas]}`
- [x] Per-socket buffer via `:grove_update_buffer` assign
- [x] Configurable debounce via `debounce: ms` option (default 100ms)
- [x] Auto-flush on `apply_and_broadcast/2` to maintain operation order
- [x] Receiver-side batch handling in `apply_remote/2`
### 10.2 Schema DSL ✅
- [x] Schema DSL for per-field CRDT types via compile-time macros:
```elixir
defmodule MyApp.FormSchema do
use Grove.Schema
version 1
node :dropdown do
field :label, :lww_register, default: "Untitled"
field :options, :rga_list, default: []
field :tags, :or_set, default: MapSet.new()
end
end
```
- [x] `Grove.Schema.Types` - Type registry for CRDT dispatch
- Supported types: `:lww_register`, `:mv_register`, `:or_set`, `:rga_list`, `:counter`
- `init/3`, `extract/1`, `update/4` for type-specific operations
- [x] `Grove.Schema` - Compile-time macro DSL
- `node/2` macro for defining node types with fields
- `field/2,3` macro for field definitions with CRDT types
- Generated functions: `new_node/3`, `read_node/1`, `update_field/4`
- [x] Schema versioning with migration DSL
- `version/1`, `migrate/2`, `add_field/4`, `remove_field/2,3`, `transform_field/3`
- [x] `Grove.Node.attrs` accepts `Grove.Map.ORMap` for schema-based nodes
- [x] Schema-agnostic merge (preserves CRDT convergence properties)
---
## Phase 11: Sequences and Trees (Future)
**Goal**: Support ordered collections and hierarchical data.
### 11.1 Research
- [x] Evaluate RGA vs Logoot vs LSEQ (chose RGA for fixed-size IDs and simplicity)
- [x] Study tree move semantics (implemented in Phase 8.5)
- [x] Review existing implementations (Yjs, Automerge) - informed RGA design
### 11.2 Sequence CRDT
- [x] `Grove.Sequence.RGA` (Replicated Growable Array):
- Ordered list with insert/delete
- Unique position identifiers `{replica_id, counter}`
- Tombstone-based deletion
- Delta-state support
- HLC timestamps for concurrent insert ordering
- Full property-based testing (commutativity, associativity, idempotence)
### 11.3 Tree CRDT
- [x] `Grove.Tree` - Hierarchical structure (implemented in Phase 8)
- [x] Move operation support with LWW conflict resolution (Phase 8.5)
- [x] Cycle detection and prevention (Phase 8.5)
- [x] Position-based sibling ordering (Fugue-inspired) for concurrent moves (Phase 11.3)
- [x] Undo/redo support for move operations (Phase 8.6)
---
## Phase 12: Eg-walker Integration (Future)
**Goal**: Optimize for sequential editing using pure operation-based CRDT approach.
> **Design Document**: `guides/future/eg-walker-integration.md`
> **Paper**: [Eg-walker: Better, Faster, Smaller](https://arxiv.org/abs/2409.14252) (EuroSys 2025 Best Paper)
### 12.0 Prerequisites (Foundation)
- [x] `Grove.Tree.Event` - Event structure with parent references (DAG edges)
- [x] `Grove.Tree.Version` - Frontier-based version tracking
- [x] Operation deduplication via event IDs
- [x] Migrate `HistoryEntry` → `Event` with parent refs
### 12.1 Critical Version Detection
- [x] Implement `Grove.Tree.Version.critical?/1` (frontier.size == 1)
- [x] Fast-path for sequential operations (skip transient merge state rebuild)
- [x] Auto-prune seen_events at critical versions (with threshold)
- [x] Track `parent_was_critical` in Events for future optimization
### 12.2 Event Graph
- [x] `Grove.Tree.EventGraph` module
- `events_since/2`, `find_common_ancestor/3`
- `topological_sort/2`, `concurrent?/3`
- `ancestors/2`, `descendants/2`
- [x] DAG traversal and querying (BFS-based)
- [x] Critical version fast paths in graph operations
### 12.3 Partial Replay
- [x] `diff/3` - Find events to retreat/advance between versions
- [x] `find_last_critical/1` - Find last critical event for replay optimization
- [x] `events_between/3` - Linear forward movement helper
### 12.4 Retreat/Advance for Trees
State rebuilding approach (recommended by CRDT expert over inverse operations):
- [x] `navigate_to_version/2` - Navigate to specific version via state rebuilding
- [x] `state_at_event/2` - Navigate to state at specific event
- [x] `rebuild_tree_state/2` - Core algorithm replaying events in topological order
- [x] Transient views (history immutable, returns computed state)
- [x] Reuse `do_apply_remote/2` for operation application
- [x] Handles all operation types (put_node, update_node, move_node, batch)
### 12.5 Storage & Persistence
- [x] Event graph serialization (columnar format) - `Grove.Tree.EventSerializer`
- [x] Snapshot + event log storage strategy - `Grove.Tree.DocumentStorage`
- [x] Event compaction at critical versions - `DocumentStorage.compact/2`
- [x] Fast document loading (snapshot + event replay)
### 12.6 Optimization
- [x] Columnar serialization with RLE - `EventSerializer` with RLE for event IDs
- [x] Benchmarking with Benchee - `benchmarks/eg_walker_bench.exs`
- [x] Persistent event_index for O(1) lookups (+15% collaborative throughput)
- [ ] B-tree index for O(log n) history insertion (future)
- [ ] Memory profiling (steady state vs peak)
### 12.7 Tests
- [x] Critical version detection tests
- [x] Event graph DAG operations
- [x] Partial replay correctness
- [x] Retreat/advance for all operation types
- [x] Convergence with Eg-walker approach (property-based tests added)
- [ ] Performance benchmarks vs traditional CRDT
---
## Testing Strategy (Ongoing)
### Unit Tests
- Each CRDT type
- Each protocol implementation
- Edge cases and error handling
### Property-Based Tests
- CRDT algebraic laws (commutative, associative, idempotent)
- Convergence under arbitrary operation sequences
- Delta equivalence to full-state merge
### Integration Tests
- Multi-process replication
- Storage persistence and recovery
- GC correctness
### Distributed Tests
- Multi-node using `:peer` module
- Network partition scenarios
- Clock skew simulation
---
## Dependencies
```elixir
# mix.exs
defp deps do
[
# Testing
{:stream_data, "~> 0.6", only: [:test, :dev]},
# Optional: Phoenix integration
{:phoenix_pubsub, "~> 2.1", optional: true},
{:phoenix_live_view, "~> 0.20", optional: true}
]
end
```
---
## Milestones
| Milestone | Phases | Description |
|-----------|--------|-------------|
| **v0.1.0** | 1-2 | Core CRDTs (counters, registers, sets, maps) |
| **v0.2.0** | 3 | Delta-state support |
| **v0.3.0** | 4-5 | Replication and storage |
| **v0.4.0** | 6-7 | GC and advanced causality |
| **v0.5.0** | 8 | Developer experience (data conversion, query API, batch ops) |
| **v1.0.0** | 9 | Phoenix/LiveView integration |
| **v1.1.0** | 10 | Performance features (debouncing, schema types) |
| **v2.0.0** | 11 | Sequences and trees |
| **v3.0.0** | 12 | Eg-walker integration (pure operation-based CRDT) |