BENCHMARKS.md

Select File
# Benchmark: ExSQL vs exqlite

How much does choosing **ExSQL** (a SQLite engine in pure Elixir) cost versus
**[`exqlite`](https://hex.pm/packages/exqlite)** (the real C SQLite, via a NIF) —
when both are driven the same way from Elixir?

## Approach

Both engines run the **same** statements, through the **same** Elixir harness,
in the **same** BEAM node, over the [sqllogictest](https://www.sqlite.org/sqllogictest/)
corpus (~620 `.test` files). The only variable is the engine, so the ratio is a
fair "what does ExSQL cost vs SQLite, from Elixir" number — no cross-process or
cross-language artifacts.

For each file the harness executes every statement/query through both engines,
discarding results (so neither pays the sqllogictest format/sort/compare cost),
and records per-file wall time, row counts, and pass/fail/timeout status into
`bench/slt_compare/results.tsv`.

`slowdown = exsql_ms / exqlite_ms` — **≤ 1.0 means ExSQL matched or beat the C
engine** on that file; above 1.0 means it was slower.

The harness lives in [`bench/slt_compare/`](https://git.sr.ht/~sbr/exsql/tree/main/item/bench/slt_compare): an isolated Mix
project depending on `{:exsql, path: "../.."}` plus `exqlite`, so the C NIF stays
out of the main project's deps.

## Results

<!-- RESULTS:START (regenerate with `python3 bench/slt_compare/summarize.py`) -->

605 corpus files, fastest-of-2 per file:

| metric | value |
|--------|-------|
| median slowdown (exsql ÷ exqlite) | **1.06×** |
| geometric mean | 1.13× |
| ExSQL ≤ exqlite (≤1.0×) | 268 (44%) |
| ≤1.4× | 430 (71%) |
| ≤2× | 560 (93%) |
| >5× | 9 (1%) |
| result mismatches | 0 |
| timeouts | 0 |

Slowest 10 files:

| slowdown | exqlite | exsql | file |
|---------:|--------:|------:|------|
| 8.8× | 8 ms | 70 ms | `index/random/1000/slt_good_3.test` |
| 8.5× | 359 ms | 3053 ms | `index/delete/10000/slt_good_0.test` |
| 8.3× | 7 ms | 58 ms | `index/random/1000/slt_good_4.test` |
| 7.4× | 10 ms | 74 ms | `index/random/1000/slt_good_1.test` |
| 6.3× | 170 ms | 1067 ms | `index/delete/1000/slt_good_0.test` |
| 6.3× | 11 ms | 69 ms | `index/random/1000/slt_good_2.test` |
| 5.6× | 167 ms | 931 ms | `index/delete/1000/slt_good_1.test` |
| 5.5× | 2669 ms | 14682 ms | `index/in/1000/slt_good_0.test` |
| 5.0× | 48 ms | 241 ms | `select1.test` |
| 5.0× | 2382 ms | 11903 ms | `index/view/10000/slt_good_0.test` |

<!-- RESULTS:END -->

**Reading the numbers.** Run from a common Elixir harness, ExSQL lands around
**parity with the C engine at the median** and is within 2× on the large majority
of files, with **no timeouts and no result mismatches**. The slowest remaining
files are unindexed full scans — which hit a measured **~4.9× floor** for any
pure-Elixir engine (the BEAM tags every integer, so a comparison is several
instructions where C does one register `cmp`) — and a few insert-heavy DML files.

The harness times each file once, so individual per-file ratios and the
median/geomean carry single-run variance (and inflate under machine load); the
**≤ 2× share** and the **zero-timeout / zero-mismatch** facts are the stable
signals. For a stable read, run on an idle machine with `--runs 2` (below).

## Reproduce

The sqllogictest corpus is an external checkout — clone it (e.g. from the SQLite
source or <https://github.com/shivam091/sqllogictest>) and point `$SLT` at its
`test/` directory.

```sh
cd bench/slt_compare && mix deps.get
SLT=/path/to/sqllogictest/test

# Full run (both engines, fastest-of-2 per file):
mix run slt_compare.exs --per-file --runs 2 --timeout-ms 30000 \
  --exclude index/between/1000/slt_good_0.test --exclude select5.test \
  --exclude select3.test --exclude select4.test \
  --output results.tsv --root "$SLT" "$SLT"

# Refresh the results table in this file:
python3 summarize.py
```

The exqlite (C SQLite) timings are a **fixed baseline** — the C engine doesn't
change between ExSQL runs. After one full run, iterate with `--reuse-exqlite`: it
reads the stored exqlite columns from `results.tsv`, re-times **only ExSQL**, and
merges. That roughly halves the wall time and pins the denominator so ratios stay
comparable across runs:

```sh
mix run slt_compare.exs --per-file --runs 2 --reuse-exqlite --timeout-ms 30000 \
  --exclude index/between/1000/slt_good_0.test --exclude select5.test \
  --exclude select3.test --exclude select4.test \
  --output results.tsv --root "$SLT" "$SLT"
python3 summarize.py
```

(The four `--exclude`d files are pathological cartesian-product / full-scan cases
that can run for tens of seconds; drop the excludes to include them. Don't pair
`--reuse-exqlite` with `--limit-files` against the real `results.tsv` — the
limited run would truncate it.)

The host app (ExitPod) renders `results.tsv` as a sortable per-file table at
`/exsql-bench`; refresh it with `python3 to_json.py` after a run.