docs/BENCHMARKS.md

Select File:
# RustyJson Benchmarks

Comprehensive benchmarks comparing RustyJson vs Jason across synthetic and real-world datasets.

## Key Findings

1. **Fast across all workloads** — plain data, struct-heavy data, and decoding (including deeply nested and small payloads)
2. **Encoding plain data** shows the largest gains — 3-6x faster, 2-3x less memory
3. **Struct encoding** optimized in v0.3.3 via single-pass iodata pipeline with compile-time codegen (~2x improvement over v0.3.2)
4. **Deep-nested decode** optimized in v0.3.3 via single-entry fast path (~27% faster than v0.3.2 for 100-level nested JSON)
5. **Larger payloads = bigger advantage** — real-world 10 MB files show better results than synthetic benchmarks
6. **BEAM scheduler load dramatically reduced** — 100-28,000x fewer reductions

## Test Environment

| Attribute | Value |
|-----------|-------|
| OS | macOS |
| CPU | Apple M1 Pro |
| Cores | 10 |
| Memory | 16 GB |
| Elixir | 1.19.4 |
| Erlang/OTP | 28.2 |

## Real-World Benchmarks: Amazon Settlement Reports

These are production JSON files from Amazon SP-API settlement reports, representing real-world API response patterns with nested objects, arrays of transactions, and mixed data types.

### Encoding Performance (Elixir → JSON)

| File Size | RustyJson | Jason | Speed | Memory |
|-----------|-----------|-------|-------|--------|
| 10.87 MB | 24 ms | 131 ms | **5.5x faster** | **2.7x less** |
| 9.79 MB | 21 ms | 124 ms | **5.9x faster** | **2-3x less** |
| 9.38 MB | 21 ms | 104 ms | **5.0x faster** | **2-3x less** |

### Decoding Performance (JSON → Elixir)

| File Size | RustyJson | Jason | Speed | Memory |
|-----------|-----------|-------|-------|--------|
| 10.87 MB | 61 ms | 152 ms | **2.5x faster** | similar |
| 9.79 MB | 55 ms | 134 ms | **2.4x faster** | similar |
| 9.38 MB | 50 ms | 119 ms | **2.4x faster** | similar |

### BEAM Reductions (Scheduler Load)

| File Size | RustyJson | Jason | Reduction |
|-----------|-----------|-------|-----------|
| 10.87 MB encode | 404 | 11,570,847 | **28,641x fewer** |

This is the most dramatic difference - RustyJson offloads virtually all work to native code.

## Synthetic Benchmarks: nativejson-benchmark

Using standard datasets from [nativejson-benchmark](https://github.com/miloyip/nativejson-benchmark):

| Dataset | Size | Description |
|---------|------|-------------|
| canada.json | 2.1 MB | Geographic coordinates (number-heavy) |
| citm_catalog.json | 1.6 MB | Event catalog (mixed types) |
| twitter.json | 617 KB | Social media with CJK (unicode-heavy) |

### Decode Performance (JSON → Elixir)

| Input | RustyJson ips | Average |
|-------|--------------|---------|
| canada.json (2.1 MB) | 153 | 6.55 ms |
| citm_catalog.json (1.6 MB) | 323 | 3.09 ms |
| twitter.json (617 KB) | 430 | 2.33 ms |
| large_list (50k items, 2.3 MB) | 62 | 16.0 ms |
| deep_nested (1.1 KB, 100 levels) | 148K | 6.75 µs |
| wide_object (75 KB, 5k keys) | 1,626 | 0.61 ms |

### Roundtrip Performance (Decode + Encode)

| Input | RustyJson | Jason | Speedup |
|-------|-----------|-------|---------|
| canada.json | 14 ms | 48 ms | **3.4x faster** |
| citm_catalog.json | 6 ms | 14 ms | **2.5x faster** |
| twitter.json | 4 ms | 9 ms | **2.3x faster** |

### BEAM Reductions by Dataset

| Dataset | RustyJson | Jason | Ratio |
|---------|-----------|-------|-------|
| canada.json | ~3,500 | ~964,000 | **275x fewer** |
| citm_catalog.json | ~300 | ~621,000 | **2,000x fewer** |
| twitter.json | ~2,000 | ~511,000 | **260x fewer** |

## Struct Encoding Benchmarks (v0.3.3+)

Encoding data that contains Elixir structs (e.g., `@derive RustyJson.Encoder` or custom `defimpl`) follows a different path than plain maps and lists. Structs require the `RustyJson.Encoder` protocol to convert them to JSON-serializable forms.

In v0.3.3, the struct encoding pipeline was rewritten from a three-pass approach (protocol dispatch → fragment resolution → NIF serialization) to a single-pass iodata pipeline with compile-time codegen for derived structs. This closed the last remaining performance gap, making RustyJson faster across all encoding workloads.

### Struct Encoding Performance

| Workload | Speedup (v0.3.3 vs v0.3.2) |
|----------|----------------------------|
| Derived struct (5 fields) | ~2x faster |
| Derived struct (10 fields) | ~2x faster |
| Custom encoder (returning `Encode.map`) | ~2.5x faster |
| List of 1,000 derived structs | ~2x faster |
| Nested structs (3 levels deep) | ~2x faster |

Measured with protocol consolidation enabled (`MIX_ENV=prod`), which is the default for production builds.

### How It Works

RustyJson's struct encoding produces iodata in a single pass:

1. **Derived encoders** (`@derive RustyJson.Encoder`) generate compile-time iodata templates with pre-escaped keys — no runtime `Map.from_struct`, `Map.to_list`, or key escaping.
2. **Map/List impls** detect struct-containing data and route through `Encode.map/2` / `Encode.list/2` to build iodata directly, wrapped in a `Fragment`.
3. **NIF bypass** — When the top-level result is an iodata Fragment (no pretty-print or compression), `IO.iodata_to_binary/1` is used directly, avoiding Erlang↔Rust term conversion entirely.

For plain data (no structs), encoding still uses the fast Rust NIF path unchanged.

## Why Encoding Shows Bigger Gains

### iolist Encoding Pattern (Pure Elixir)

```
encode(data)
  → allocate "{" binary
  → allocate "\"key\"" binary
  → allocate ":" binary
  → allocate "\"value\"" binary
  → allocate list cells to link them
  → return iolist (many BEAM allocations)
```

### RustyJson's Encoding Pattern (NIF)

```
encode(data)
  → [Rust: walk terms, write to single buffer]
  → copy buffer to BEAM binary
  → return binary (one BEAM allocation)
```

Pure-Elixir encoders create many small BEAM allocations. RustyJson creates one.

### Why Decoding Memory is Similar

Both libraries produce identical Elixir data structures when decoding. The resulting maps, lists, and strings take the same space regardless of which library created them.

## Why Benchee Memory Measurements Don't Work for NIFs

**Important**: Benchee's `memory_time` option gives misleading results for NIF-based libraries.

### What Benchee Reports (Incorrect)

```
| Library   | Memory    |
|-----------|-----------|
| RustyJson | 0.00169 MB |
| Jason     | 20.27 MB   |
```

This suggests 12,000x less memory - which is wrong.

### Why This Happens

Benchee measures memory using `:erlang.memory/0`, which only tracks BEAM allocations:
- BEAM process heap
- BEAM binary space
- ETS tables

RustyJson allocates memory in **Rust via mimalloc**, completely invisible to BEAM tracking. The 0.00169 MB is just NIF call overhead.

### How We Measure Instead

We use `:erlang.memory(:total)` delta in isolated spawned processes:

```elixir
spawn(fn ->
  :erlang.garbage_collect()
  before = :erlang.memory(:total)
  results = for _ <- 1..10, do: RustyJson.encode!(data)
  after_mem = :erlang.memory(:total)
  # Report (after_mem - before) / 10
end)
```

This captures BEAM allocations during the operation. For total system memory (including NIF), we verified with RSS measurements that Rust adds only ~1-2 MB temporary overhead.

### Actual Memory Comparison

For a 10 MB settlement report encode:

| Metric | RustyJson | Jason |
|--------|-----------|-------|
| BEAM memory | 6.7 MB | 17.9 MB |
| NIF overhead | ~1-2 MB | N/A |
| **Total** | **~8 MB** | **~18 MB** |
| **Ratio** | | **2-3x less** |

## Running Benchmarks

```bash
# 1. Download synthetic test data
mkdir -p bench/data && cd bench/data
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/canada.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/citm_catalog.json
curl -LO https://raw.githubusercontent.com/miloyip/nativejson-benchmark/master/data/twitter.json
cd ../..

# 2. Run memory benchmarks (no extra deps needed)
mix run bench/memory_bench.exs

# 3. (Optional) Run speed benchmarks with Benchee
# Add to mix.exs: {:benchee, "~> 1.0", only: :dev}
mix deps.get
mix run bench/stress_bench.exs
```

## Key Interning Benchmarks

The `keys: :intern` option provides significant speedups when decoding arrays of objects with repeated keys (common in API responses, database results, etc.).

### When Key Interning Helps: Homogeneous Arrays

Arrays where every object has the same keys:

```json
[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, ...]
```

| Scenario | Default | `keys: :intern` | Improvement |
|----------|---------|-----------------|-------------|
| 100 objects × 5 keys | 34.2 µs | 23.6 µs | **31% faster** |
| 100 objects × 10 keys | 67.5 µs | 44.8 µs | **34% faster** |
| 1,000 objects × 5 keys | 335 µs | 237 µs | **29% faster** |
| 1,000 objects × 10 keys | 688 µs | 463 µs | **33% faster** |
| 10,000 objects × 5 keys | 3.46 ms | 2.45 ms | **29% faster** |
| 10,000 objects × 10 keys | 6.92 ms | 4.88 ms | **29% faster** |

### When Key Interning Hurts: Unique Keys

Single objects or heterogeneous arrays where keys aren't repeated:

| Scenario | Default | `keys: :intern` | Penalty |
|----------|---------|-----------------|---------|
| Single object, 100 keys | 5.1 µs | 13.6 µs | **2.6x slower** |
| Single object, 1,000 keys | 52 µs | 169 µs | **3.2x slower** |
| Single object, 5,000 keys | 260 µs | 831 µs | **3.2x slower** |
| Heterogeneous 100 objects | 35 µs | 96 µs | **2.7x slower** |
| Heterogeneous 500 objects | 186 µs | 475 µs | **2.5x slower** |

### Scaling: Benefit Increases with Object Count

With 5 keys per object, the benefit grows as more objects reuse the cached keys:

| Objects | Default | `keys: :intern` | Improvement |
|---------|---------|-----------------|-------------|
| 10 | 3.5 µs | 3.0 µs | 13% faster |
| 50 | 17.1 µs | 12.5 µs | 27% faster |
| 100 | 33.8 µs | 23.8 µs | 30% faster |
| 500 | 170 µs | 119 µs | 30% faster |
| 1,000 | 339 µs | 242 µs | 29% faster |
| 5,000 | 1.81 ms | 1.24 ms | 31% faster |
| 10,000 | 3.47 ms | 2.49 ms | 28% faster |

### Usage Recommendation

```elixir
# API responses, database results, bulk data
RustyJson.decode!(json, keys: :intern)

# Config files, single objects, unknown schemas
RustyJson.decode!(json)  # default, no interning
```

**Rule of thumb**: Use `keys: :intern` when you know you're decoding arrays of 10+ objects with the same schema.

**Note**: Keys containing escape sequences (e.g., `"field\nname"`) are not interned because the raw JSON bytes differ from the decoded string. This is rare in practice and has negligible performance impact.

## Summary

| Operation | Speed | Memory | Reductions |
|-----------|-------|--------|------------|
| **Encode plain data (large)** | 5-6x | 2-3x less | 28,000x fewer |
| **Encode plain data (medium)** | 2-3x | 2-3x less | 200-2000x fewer |
| **Encode structs (v0.3.3+)** | ~2x improvement over v0.3.2 | similar | — |
| **Decode (large)** | 2-4.5x | similar | — |
| **Decode (deep nested, v0.3.3+)** | ~27% improvement over v0.3.2 | similar | — |
| **Decode (keys: :intern)** | +30%* | similar | — |

*For arrays of objects with repeated keys (API responses, DB results, etc.)

**Bottom line**: As of v0.3.3, RustyJson is fast across all encoding and decoding workloads, including deeply nested and small payloads. Plain data encoding shows the largest gains (5-6x, 2-3x less memory, dramatically fewer BEAM reductions). Struct encoding was rewritten in v0.3.3 with a single-pass iodata pipeline. Deep-nested decode was optimized in v0.3.3 with a single-entry fast path that avoids heap allocation for single-element objects and arrays. For decoding bulk data, enable `keys: :intern` for an additional 30% speedup.