Skip to main content

guides/concepts.md

# Core Concepts

This guide explains the ideas behind Grove. You don't need to understand all of this to use Grove, but it helps when things get complex.

## The Problem: Sync is Hard

When multiple users (or devices) edit the same data, conflicts happen:

```mermaid
sequenceDiagram
    participant A as Alice
    participant S as Server
    participant B as Bob

    A->>S: title = "Meeting Notes"
    B->>S: title = "Project Plan"
    Note over S: Which one wins?
```

**Traditional solutions all have drawbacks:**

| Approach | Problem |
|----------|---------|
| **Last-write-wins** | Alice's change is silently lost |
| **Lock the document** | Bob has to wait, can't work offline |
| **Manual conflict resolution** | User sees scary merge dialogs |
| **Operational Transform (Google Docs)** | Complex, needs central server |

## CRDTs: A Better Way

**CRDT** stands for "Conflict-free Replicated Data Type."

The key insight: **Design your data structure so that any order of operations produces the same result.**

```mermaid
flowchart TB
    subgraph "Same Result, Any Order"
        A1[Operation A] --> R1[Result: XYZ]
        B1[Operation B] --> R1

        A2[Operation B] --> R2[Result: XYZ]
        B2[Operation A] --> R2
    end
```

This is called **commutativity** — A + B = B + A.

### How Grove Achieves This

1. **Unique IDs everywhere** — Every node, every operation has a globally unique ID
2. **Timestamps for ordering** — When two operations conflict, timestamps decide
3. **Merge by design** — Data structures are designed to merge, not conflict

## Grove's Data Model

### Trees and Nodes

Grove stores data as a tree of nodes:

```mermaid
graph TD
    R[Root] --> A[Form]
    A --> B[Step 1]
    A --> C[Step 2]
    B --> D[Field: Name]
    B --> E[Field: Email]
    C --> F[Field: Address]
```

Each node has:

| Field | Description |
|-------|-------------|
| `id` | Unique identifier |
| `type` | What kind of node (atom) |
| `attrs` | Node data (map) |
| `parent_id` | Who is my parent? |
| `children` | Ordered list of child IDs |

### Operations

Every change creates an **operation**:

```elixir
# These create operations internally
Grove.Tree.put_node(tree, node)     # => {:put_node, id, node}
Grove.Tree.update_node(tree, id, f) # => {:update_node, id, changes}
Grove.Tree.delete_node(tree, id)    # => {:delete_node, id}
Grove.Tree.move_node(tree, id, p)   # => {:move_node, id, new_parent}
```

Operations are wrapped in **events** with metadata:

```elixir
%Grove.Tree.Event{
  id: {"laptop", 42},           # {replica_id, sequence}
  operation: {:put_node, ...},  # The actual change
  parents: MapSet.new([...]),   # Causal dependencies
  timestamp: 1704067200000      # When it happened
}
```

### Replica IDs

Every device/user needs a unique **replica ID**:

```mermaid
flowchart LR
    subgraph "Replica IDs"
        L[Laptop: abc123]
        P[Phone: def456]
        T[Tablet: ghi789]
    end

    L --> S[Same Document]
    P --> S
    T --> S
```

Replica IDs ensure:
- Node IDs are globally unique (`{replica_id}_{sequence}`)
- Operations can be attributed to their source
- Conflicts can be resolved deterministically

## Eg-walker: Why Grove is Fast

Grove implements **Eg-walker**, an optimization from recent research (EuroSys 2025).

### The Key Insight

Most collaborative editing is **sequential** — one person typing, then another. True concurrency (two people typing at the exact same moment) is rare.

```mermaid
pie title Typical Editing Patterns
    "Sequential (fast-path)" : 85
    "Concurrent (slow-path)" : 15
```

### Fast-Path vs Slow-Path

```mermaid
flowchart TD
    E[Incoming Event] --> C{Is history linear?}
    C -->|Yes| F[Fast-path: Just apply it]
    C -->|No| S[Slow-path: Merge with CRDT rules]

    F --> R[Result]
    S --> R
```

**Fast-path** (sequential operations):
- Direct application
- No merge overhead
- O(1) complexity

**Slow-path** (concurrent operations):
- Build merge state
- Apply CRDT rules
- Still automatic, just slower

### Performance Impact

| Scenario | Throughput | Memory |
|----------|------------|--------|
| Sequential (fast-path) | 617 ops/sec | 1.4 MB |
| Concurrent (slow-path) | 63 ops/sec | 21.8 MB |

The fast-path is **10x faster** and uses **15x less memory**.

## Key Guarantees

Grove provides these guarantees:

### 1. Eventual Consistency

All replicas that have seen the same operations will have the same state.

```mermaid
flowchart LR
    subgraph "Eventually..."
        A[Replica A] -.->|sync| B[Replica B]
        B -.->|sync| C[Replica C]
        C -.->|sync| A
    end

    A --> S[Same State]
    B --> S
    C --> S
```

### 2. No Conflicts

Operations always merge. There's no "conflict" state that requires user intervention.

### 3. Causality Preserved

If operation B happened after seeing operation A, then B will always be applied after A.

### 4. Intent Preserved

Operations do what they intended:
- Adds stay added (unless explicitly deleted)
- Deletes stay deleted
- Moves go to intended parent (unless it would create a cycle)

## Move Operations

Moving nodes in a distributed tree is the hardest problem. Grove implements Kleppmann's algorithm:

```mermaid
flowchart TD
    subgraph "Concurrent Moves"
        A1[Alice: Move X under Y]
        B1[Bob: Move Y under X]
    end

    A1 --> M{Would create cycle?}
    B1 --> M
    M -->|Yes| R[One move wins by timestamp]
    M -->|No| OK[Both moves apply]
```

**Guarantees:**
- No cycles ever created
- Deterministic winner (by Lamport timestamp)
- Both replicas converge to same tree

## Summary

| Concept | What It Means |
|---------|---------------|
| **CRDT** | Data structure designed to merge automatically |
| **Replica ID** | Unique identifier for each device/user |
| **Operation** | A single change (add, update, delete, move) |
| **Event** | Operation + metadata (ID, timestamp, parents) |
| **Fast-path** | Optimized path for sequential operations |
| **Slow-path** | Full merge for concurrent operations |
| **Convergence** | All replicas end up with same state |

## Next Steps

- **[Use Cases](use-cases.md)** — See real-world applications
- **[Benchmarks](benchmarks.md)** — Performance details
- **[Architecture](architecture.md)** — Implementation deep-dive