# Core Concepts
This guide explains the ideas behind Grove. You don't need to understand all of this to use Grove, but it helps when things get complex.
## The Problem: Sync is Hard
When multiple users (or devices) edit the same data, conflicts happen:
```mermaid
sequenceDiagram
participant A as Alice
participant S as Server
participant B as Bob
A->>S: title = "Meeting Notes"
B->>S: title = "Project Plan"
Note over S: Which one wins?
```
**Traditional solutions all have drawbacks:**
| Approach | Problem |
|----------|---------|
| **Last-write-wins** | Alice's change is silently lost |
| **Lock the document** | Bob has to wait, can't work offline |
| **Manual conflict resolution** | User sees scary merge dialogs |
| **Operational Transform (Google Docs)** | Complex, needs central server |
## CRDTs: A Better Way
**CRDT** stands for "Conflict-free Replicated Data Type."
The key insight: **Design your data structure so that any order of operations produces the same result.**
```mermaid
flowchart TB
subgraph "Same Result, Any Order"
A1[Operation A] --> R1[Result: XYZ]
B1[Operation B] --> R1
A2[Operation B] --> R2[Result: XYZ]
B2[Operation A] --> R2
end
```
This is called **commutativity** — A + B = B + A.
### How Grove Achieves This
1. **Unique IDs everywhere** — Every node, every operation has a globally unique ID
2. **Timestamps for ordering** — When two operations conflict, timestamps decide
3. **Merge by design** — Data structures are designed to merge, not conflict
## Grove's Data Model
### Trees and Nodes
Grove stores data as a tree of nodes:
```mermaid
graph TD
R[Root] --> A[Form]
A --> B[Step 1]
A --> C[Step 2]
B --> D[Field: Name]
B --> E[Field: Email]
C --> F[Field: Address]
```
Each node has:
| Field | Description |
|-------|-------------|
| `id` | Unique identifier |
| `type` | What kind of node (atom) |
| `attrs` | Node data (map) |
| `parent_id` | Who is my parent? |
| `children` | Ordered list of child IDs |
### Operations
Every change creates an **operation**:
```elixir
# These create operations internally
Grove.Tree.put_node(tree, node) # => {:put_node, id, node}
Grove.Tree.update_node(tree, id, f) # => {:update_node, id, changes}
Grove.Tree.delete_node(tree, id) # => {:delete_node, id}
Grove.Tree.move_node(tree, id, p) # => {:move_node, id, new_parent}
```
Operations are wrapped in **events** with metadata:
```elixir
%Grove.Tree.Event{
id: {"laptop", 42}, # {replica_id, sequence}
operation: {:put_node, ...}, # The actual change
parents: MapSet.new([...]), # Causal dependencies
timestamp: 1704067200000 # When it happened
}
```
### Replica IDs
Every device/user needs a unique **replica ID**:
```mermaid
flowchart LR
subgraph "Replica IDs"
L[Laptop: abc123]
P[Phone: def456]
T[Tablet: ghi789]
end
L --> S[Same Document]
P --> S
T --> S
```
Replica IDs ensure:
- Node IDs are globally unique (`{replica_id}_{sequence}`)
- Operations can be attributed to their source
- Conflicts can be resolved deterministically
## Eg-walker: Why Grove is Fast
Grove implements **Eg-walker**, an optimization from recent research (EuroSys 2025).
### The Key Insight
Most collaborative editing is **sequential** — one person typing, then another. True concurrency (two people typing at the exact same moment) is rare.
```mermaid
pie title Typical Editing Patterns
"Sequential (fast-path)" : 85
"Concurrent (slow-path)" : 15
```
### Fast-Path vs Slow-Path
```mermaid
flowchart TD
E[Incoming Event] --> C{Is history linear?}
C -->|Yes| F[Fast-path: Just apply it]
C -->|No| S[Slow-path: Merge with CRDT rules]
F --> R[Result]
S --> R
```
**Fast-path** (sequential operations):
- Direct application
- No merge overhead
- O(1) complexity
**Slow-path** (concurrent operations):
- Build merge state
- Apply CRDT rules
- Still automatic, just slower
### Performance Impact
| Scenario | Throughput | Memory |
|----------|------------|--------|
| Sequential (fast-path) | 617 ops/sec | 1.4 MB |
| Concurrent (slow-path) | 63 ops/sec | 21.8 MB |
The fast-path is **10x faster** and uses **15x less memory**.
## Key Guarantees
Grove provides these guarantees:
### 1. Eventual Consistency
All replicas that have seen the same operations will have the same state.
```mermaid
flowchart LR
subgraph "Eventually..."
A[Replica A] -.->|sync| B[Replica B]
B -.->|sync| C[Replica C]
C -.->|sync| A
end
A --> S[Same State]
B --> S
C --> S
```
### 2. No Conflicts
Operations always merge. There's no "conflict" state that requires user intervention.
### 3. Causality Preserved
If operation B happened after seeing operation A, then B will always be applied after A.
### 4. Intent Preserved
Operations do what they intended:
- Adds stay added (unless explicitly deleted)
- Deletes stay deleted
- Moves go to intended parent (unless it would create a cycle)
## Move Operations
Moving nodes in a distributed tree is the hardest problem. Grove implements Kleppmann's algorithm:
```mermaid
flowchart TD
subgraph "Concurrent Moves"
A1[Alice: Move X under Y]
B1[Bob: Move Y under X]
end
A1 --> M{Would create cycle?}
B1 --> M
M -->|Yes| R[One move wins by timestamp]
M -->|No| OK[Both moves apply]
```
**Guarantees:**
- No cycles ever created
- Deterministic winner (by Lamport timestamp)
- Both replicas converge to same tree
## Summary
| Concept | What It Means |
|---------|---------------|
| **CRDT** | Data structure designed to merge automatically |
| **Replica ID** | Unique identifier for each device/user |
| **Operation** | A single change (add, update, delete, move) |
| **Event** | Operation + metadata (ID, timestamp, parents) |
| **Fast-path** | Optimized path for sequential operations |
| **Slow-path** | Full merge for concurrent operations |
| **Convergence** | All replicas end up with same state |
## Next Steps
- **[Use Cases](use-cases.md)** — See real-world applications
- **[Benchmarks](benchmarks.md)** — Performance details
- **[Architecture](architecture.md)** — Implementation deep-dive