guides/benchmarks.md

Select File
guides/benchmarks.md

# Performance Benchmarks

Grove includes a comprehensive benchmark suite for measuring Eg-walker CRDT performance.

## Running Benchmarks

```bash
# Run full benchmark suite
mix run benchmarks/eg_walker_bench.exs

# Run with production optimizations
MIX_ENV=prod mix run benchmarks/eg_walker_bench.exs
```

## Benchmark Scenarios

| Scenario | Description | Tests |
|----------|-------------|-------|
| Sequential | Single replica, 1000 ops | Fast-path optimization |
| Collaborative | 5 replicas × 200 ops, 20% reorder | Mixed fast/slow path |
| High Concurrency | 10 replicas × 100 concurrent ops | Slow-path merging |
| Large Document | 100-10k nodes + 100 ops | Scalability |
| Move-heavy | 500-1k nodes, 500-1k moves | Operation log performance |

## Current Results

```
Date: 2026-06-14
Elixir: 1.20.1
OTP: 29
```

> **Reproducibility note**: The collaborative and high-concurrency scenarios
> build their workloads with an unseeded `Enum.shuffle`, so those rows vary
> run-to-run and should be read as indicative rather than exact. In this run the
> shuffled concurrent workload happened to stay light, so the fast-path/slow-path
> gap is much smaller than the algorithm's worst case. The deterministic suites
> (sequential, local operations, scalability) are the reliable signal — they show
> a consistent ~1.5–1.8x throughput gain from the Elixir 1.18→1.20 / OTP 25→29
> upgrade with no code changes.

### Eg-walker vs Traditional CRDT

Direct comparison using the same 1000 sequential operations:

| Implementation | ips | Average | Memory |
|----------------|-----|---------|--------|
| Eg-walker (fast-path) | 956.71 | 1.05 ms | 1.36 MB |
| Traditional CRDT (slow-path) | 987.87 | 1.01 ms | 1.37 MB |

**For pure sequential put_node operations**: Only ~3% difference.

**Why?** Simple put_node operations don't have much merge overhead. The real benefit appears with:
- Concurrent operations requiring merge
- Move operations with undo-do-redo
- Long histories requiring traversal

See the high-concurrency benchmark below for the real difference.

### apply_remote Performance

| Scenario | ips | Average | Median | 99th % | Memory |
|----------|-----|---------|--------|--------|--------|
| Sequential (fast-path) | 918 | 1.09 ms | 1.05 ms | 2.17 ms | 1.36 MB |
| Collaborative (mixed) | 781 | 1.28 ms | 1.24 ms | 2.27 ms | 1.68 MB |
| High-concurrency (slow-path) | 834 | 1.20 ms | 1.14 ms | 2.40 ms | 1.51 MB |

**Key insight**: The fast-path applies sequential edits with near-zero CRDT
overhead. The fast-path/slow-path gap scales with how much genuine concurrency
the workload contains; see the reproducibility note above for why the concurrent
rows in this run stayed close to the fast-path.

**The real Eg-walker benefit**:
- Moving from sequential to genuinely concurrent workloads costs throughput and
  memory in proportion to the concurrency (the slow-path must build transient
  merge state and replay via undo-do-redo)
- This is where Eg-walker shines: sequential editing has near-zero CRDT overhead

### Local Operations

| Operation | ips | Average | Memory |
|-----------|-----|---------|--------|
| get_node | 9.59 M | 104 ns | 0 B |
| put_node | 2.08 M | 481 ns | 896 B |
| update_node | 1.11 M | 900 ns | 2960 B |

### Scalability (Document Size)

| Document Size | ips | Average | Memory |
|---------------|-----|---------|--------|
| 100 nodes + 100 ops | 9.22 K | 108 μs | 128 KB |
| 1k nodes + 100 ops | 10.20 K | 98 μs | 138 KB |
| 10k nodes + 100 ops | 9.59 K | 104 μs | 150 KB |

**Key insight**: Document size has minimal impact on operation time (O(1) node access via Map)

## Fast-path vs Slow-path

The Eg-walker implementation optimizes for sequential editing (the common case):

### Fast-path (Sequential Events)
- **When**: Events extend the frontier linearly (single user or turn-taking)
- **Performance**: 918 ips, 1.36 MB memory
- **Overhead**: Near-zero CRDT overhead
- **How it works**: Direct application, no transformation needed

### Slow-path (Concurrent Events)
- **When**: Multiple replicas with divergent branches
- **Overhead**: Full CRDT merge with undo-do-redo
- **How it works**: Builds transient merge state, replays operations
- **Cost**: Throughput and memory degrade in proportion to the concurrency in
  the workload (see the reproducibility note above — the shuffled concurrent
  benchmark is not a fixed workload, so its absolute numbers vary per run)

### Key Takeaway

For the **same sequential workload**:
- Eg-walker fast-path: 956 ips
- Traditional CRDT (forced slow-path): 987 ips
- Difference: Only ~3%

The benefit of the fast-path grows with the degree of genuine concurrency: a
purely sequential workload pays almost no CRDT overhead, while a heavily
concurrent one must build transient merge state and replay operations.

## Optimization History

### event_index Optimization (2026-01-02)

Added persistent `event_index` field to Tree struct for O(1) event lookups.

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Sequential throughput | 604 ips | 605 ips | +0.2% |
| Collaborative throughput | 122.5 ips | 140.5 ips | **+14.7%** |
| Slow-path throughput | 55.5 ips | 57.4 ips | +3.4% |
| Memory (collaborative) | 5.91 MB | 5.64 MB | -4.6% |

**Implementation**:
- `event_index` field in Tree struct (lazy construction)
- `ensure_index/1` builds index from history when needed
- `get_event/2` for O(1) lookup by event ID
- Incremental updates in `record_history/3` and `record_remote_event/2`

## Metrics Explained

- **ips**: Iterations per second (higher is better)
- **Average**: Mean time per operation
- **Median**: 50th percentile latency
- **99th %**: Tail latency (important for UI responsiveness)
- **Memory**: Total bytes allocated per benchmark run

## Future Optimizations

Remaining targets identified:
- `find_insertion_point` - O(n) list scan → binary search
- `build_children_map` in EventGraph - O(n) rebuild → cache
- Pass event_index to EventGraph functions to avoid rebuilding