Skip to main content

docs/performance.md

# Performance Notes

Current priority is numerical compatibility and predictable edge behavior.

Run the Python-reference comparison benchmark with:

```bash
cd reference/python
uv run python benchmark.py --mode quick
```

The benchmark generates deterministic sample data, times the pinned Python
references, and then runs Statwise against the same data via
`reference/elixir/benchmark.exs`. Results are reported as microseconds per
operation using the best timed trial from several full repeat batches to reduce
scheduler noise. Use `--trials N` to override the default trial count,
`--json-output benchmark_baseline_quick.json` to refresh the tracked baseline,
or `--baseline benchmark_baseline_quick.json --fail-ratio 2.0` to check for
regressions.

Implemented now:

- Descriptive statistics accept one-dimensional Nx tensors.
- Tensor-native descriptive reductions are available with `backend: :tensor`;
  the default tensor path still normalizes through the scalar implementation
  because it benchmarks faster on `Nx.BinaryBackend` for the current workloads.
- List-backed descriptive statistics use direct Elixir reductions to avoid
  building Nx tensors for scalar results.
- Ranking and Mann-Whitney U use ordinary Elixir control flow because tie
  grouping and exact distribution logic are easier to audit this way.
- Mann-Whitney U computes rank sums and tie correction from one sorted pass,
  and caches exact U distributions by sample-size pair.
- T-tests use scalar formulas after one-dimensional input normalization.
- T-tests reuse single-pass sample summaries instead of recomputing
  mean/variance/standard error through repeated normalization.
- Student's t quantiles stop bisection after double-precision convergence
  instead of running a fixed long iteration count.
- Dataframe-style test APIs extract columns first, then reuse the same raw
  sample implementations.
- Dataframe-style test APIs support `input: :tensor`, including
  `Explorer.Series.to_tensor/2` when Explorer is loaded by the caller.

Before optimizing:

- Benchmark list input versus tensor input.
- Benchmark dataframe column extraction overhead separately from test
  computation.
- Identify hot paths with representative sample sizes.
- Preserve fixture compatibility before and after optimization.
- Prefer `Nx.Defn` only when the algorithm maps cleanly to tensor operations.

Candidate future work:

- Batched descriptive statistics with `axis`.
- Batched one-sample and independent t-tests.
- Faster ranking for very large Mann-Whitney samples.
- Faster Student's t CDF approximations for t-test p-values.
- Optional EXLA benchmarks to determine when `backend: :tensor` becomes a net
  win.