Skip to main content

docs/compatibility.md

# Compatibility Contract

Statwise is a one-dimensional statistics library with results checked against
NumPy, SciPy, and Statsmodels fixtures. The public API is Elixir-native; Python
libraries are behavioral references, not API templates.

## Shared Input Rules

- Raw samples are finite numeric lists or one-dimensional `Nx.Tensor`s.
- Integers are cast to `f64`.
- Multidimensional tensors raise `ArgumentError`.
- Infinite values (`:infinity` and `:neg_infinity` from Nx special values) raise
  `ArgumentError`.
- NaN behavior is controlled with `nan_policy`.

The inferential test APIs also accept dataframe-style column inputs through
`columns:` and `pairs:` options:

- A map of columns may be passed directly, with atom or string column keys.
- An `Explorer.DataFrame` may be passed when Explorer is loaded by the caller's
  application. Explorer is optional and is not a Statwise dependency.
- Extracted columns must contain raw sample values supported by Statwise.
- `nil` column values are treated as `:nan` and then handled by the selected
  `nan_policy`.
- Column extraction defaults to `input: :list`. Pass `input: :tensor` to
  convert map columns to one-dimensional `f64` tensors or, for Explorer
  columns, to call `Explorer.Series.to_tensor/2` when available.

For two-sample tests, `columns: [:x, :y]` returns one result. Passing
`pairs: [x: :y, before: :after]` returns a map keyed by
`{left_column, right_column}`. For one-sample t-tests, `columns: :x` returns
one result and `columns: [:x, :y]` returns a map keyed by column.

Tensor-native reductions are opt-in with `backend: :tensor`. Without this
option, tensor inputs are normalized through the same scalar path as lists,
which is currently faster for many small and mid-sized operations on
`Nx.BinaryBackend`.

## NaN Policy

Supported values:

- `:raise` rejects NaN inputs. This is the default.
- `:propagate` returns NaN statistics/p-values for inferential tests or NaN
  values for descriptive/ranking operations where applicable.
- `:omit` removes NaNs before computing.

Paired t-tests apply `:omit` pairwise: a pair is removed when either side is
NaN. Independent tests and Mann-Whitney U apply `:omit` per sample.

If omission leaves too few observations, the function raises the same
insufficient-sample error it would raise for a too-small original sample.

## Descriptive Statistics

Reference: NumPy 2.3.0.

Functions:

- `Statwise.Descriptive.count/2`
- `Statwise.Descriptive.sum/2`
- `Statwise.Descriptive.mean/2`
- `Statwise.Descriptive.variance/2`
- `Statwise.Descriptive.stddev/2`
- `Statwise.Descriptive.standard_error/2`

Variance defaults to sample variance with `correction: 1`. Population variance
is available with `correction: 0`.

## T-Tests

References:

- Statsmodels 0.14.6 for independent t-tests.
- SciPy 1.16.0 for one-sample and paired t-tests.

Functions:

- `Statwise.TTest.one_sample/2`
- `Statwise.TTest.paired/2` for dataframe-style column inputs
- `Statwise.TTest.paired/3`
- `Statwise.TTest.independent/2` for dataframe-style column inputs
- `Statwise.TTest.independent/3`

Supported alternatives are `:two_sided`, `:greater`, and `:less`.

Independent tests support:

- `variance: :welch`
- `variance: :pooled`
- `null_difference: float`
- `confidence_level: float`, defaulting to `0.95`
- `effect_size: boolean`, defaulting to `false`

Confidence intervals are returned in `result.confidence_interval`.

- One-sample t-tests report intervals for the sample mean, matching SciPy's
  `TtestResult.confidence_interval`.
- Paired t-tests report intervals for the mean paired difference.
- Independent t-tests report intervals for `mean_x - mean_y`.
- One-sided alternatives use one infinite bound, represented as `:infinity` or
  `:neg_infinity`.

When `effect_size: true`, t-test results include:

- `cohens_d`
- `hedges_g`

One-sample and paired tests use the sample standard deviation as the
standardizer. Independent tests use the pooled standard deviation as the
standardizer for both Welch and pooled tests. Hedges' g uses the small-sample
correction `1 - 3 / (4 * df - 1)`.

Zero standard-error cases are explicit:

- If the observed difference is zero, `statistic`, `p_value`, and Welch `df`
  values that are undefined are returned as `:nan`.
- If the observed difference is positive with zero standard error, the
  statistic is `:infinity`.
- If the observed difference is negative with zero standard error, the
  statistic is `:neg_infinity`.
- Pooled independent t-tests keep their finite degrees of freedom in this
  case. Welch independent t-tests return `df: :nan` when both samples have zero
  variance, matching Statsmodels' degenerate-output shape.

## Ranking

Reference: SciPy 1.16.0 `rankdata(method="average")`.

Function:

- `Statwise.Nonparametric.Rank.ranks/2`

Only average tie ranking is currently supported. Other tie methods are
intentionally deferred.

## Mann-Whitney U

Reference: SciPy 1.16.0 `mannwhitneyu`.

Function:

- `Statwise.MannWhitney.test/2` for dataframe-style column inputs
- `Statwise.MannWhitney.test/3`

Supported alternatives are `:two_sided`, `:greater`, and `:less`.

Supported methods:

- `:asymptotic`
- `:exact`
- `:auto`

Like SciPy, explicit `method: :exact` does not apply a tie correction. `:auto`
uses exact p-values when there are no ties and the smaller sample has at most 8
observations; otherwise it uses the asymptotic normal approximation.

The returned `statistic` is `U1`, the U statistic for the first sample. `U1` and
`U2` are also available in result metadata.

Mann-Whitney U results include:

- `effect_size.common_language`, computed as `U1 / (n_x * n_y)`.
- `effect_size.rank_biserial`, computed as `2 * common_language - 1`.
- `effect_size.cliffs_delta`, an alias of `rank_biserial`.

## Deferred Compatibility Areas

- Weighted tests.
- Multidimensional `axis` behavior.
- Missing-data policies beyond `nan_policy` for the current functions.
- Masked arrays.
- Permutation tests.
- Additional rank tie methods.