docs/statistical_tests_gallery.livemd

Select File
docs/statistical_tests_gallery.livemd

# Statwise Statistical Tests Gallery

## Section

This Livebook is a runnable tour of Statwise statistical tests. It covers the
available test families, common variants, result fields, dataframe-style inputs,
visual annotations, and practical guidance for choosing a test.

## Setup

```elixir
Mix.install([
  {:statwise, path: Path.expand("..", __DIR__)},
  {:jason, "~> 1.4"},
  {:vega_lite, "~> 0.1"},
  {:kino_vega_lite, "~> 0.1"}
])
```

```elixir
alias Statwise.{MannWhitney, TTest, Visualization}
```

## Quick Selection Guide

Use a **one-sample t-test** when you have one numeric sample and want to compare
its mean to a fixed reference value.

Use a **paired t-test** when each observation has a natural before/after or
matched-pair relationship.

Use an **independent t-test** when you have two independent groups and want to
compare means. Prefer `variance: :welch` by default because it does not assume
equal variances. Use `variance: :pooled` only when equal variances are part of
the study design or a deliberate modeling assumption.

Use **Mann-Whitney U** when you have two independent groups and want a
rank-based nonparametric comparison. It is useful when the mean is not the right
summary target, sample sizes are small, data are ordinal, or the distribution is
strongly skewed. It is not a drop-in paired-test replacement.

Use **rank utilities** when you want to inspect or reuse average ranks directly.

## Tutorial Data

```elixir
baseline = [9.8, 10.1, 10.4, 9.9, 10.2, 10.3]

before = [10.2, 11.5, 12.1, 13.8, 12.9, 11.7]
after_values = [9.9, 10.8, 11.2, 12.6, 12.1, 10.9]

control = [1.2, 1.9, 2.4, 2.9, 2.7, 2.2]
treatment = [2.2, 3.0, 3.4, 4.1, 4.8, 3.6]

ordinal_control = [1, 2, 2, 3, 3, 4]
ordinal_treatment = [3, 4, 4, 5, 5, 6]

df = %{
  baseline: baseline,
  before: before,
  after: after_values,
  control: control,
  treatment: treatment,
  ordinal_control: ordinal_control,
  ordinal_treatment: ordinal_treatment
}

rows =
  Enum.map(control, &%{group: :control, score: &1, site: :north}) ++
    Enum.map(treatment, &%{group: :treated, score: &1, site: :north}) ++
    Enum.map([1.0, 1.4, 1.7, 2.0], &%{group: :control, score: &1, site: :south}) ++
    Enum.map([1.8, 2.2, 2.5, 2.9], &%{group: :treated, score: &1, site: :south})
```

## Reading Test Results

All inferential tests return `%Statwise.TestResult{}`. The most commonly used
fields are:

* `test`: which test was run.
* `statistic`: the t statistic, U statistic, or other test statistic.
* `p_value`: the p-value for the configured alternative.
* `alternative`: `:two_sided`, `:greater`, or `:less`.
* `estimate`: estimated means, differences, and standard errors where available.
* `confidence_interval`: confidence interval metadata where available.
* `effect_size`: optional or built-in effect sizes.
* `n`: sample sizes used by the test.

```elixir
result = TTest.independent(control, treatment, variance: :welch, effect_size: true)

Map.take(result, [
  :test,
  :statistic,
  :p_value,
  :alternative,
  :method,
  :estimate,
  :confidence_interval,
  :effect_size,
  :n
])
```

## One-Sample T-Test

Use this when one sample should be compared with a reference mean.

```elixir
TTest.one_sample(baseline,
  mean: 10.0,
  alternative: :two_sided,
  confidence_level: 0.95,
  effect_size: true
)
```

Use `alternative: :greater` when the scientific question is specifically
whether the sample mean is greater than the reference value:

```elixir
TTest.one_sample(baseline,
  mean: 10.0,
  alternative: :greater
)
```

## Paired T-Test

Use this for before/after, matched subjects, repeated measures, or other paired
observations. The test is performed on within-pair differences.

```elixir
TTest.paired(before, after_values,
  alternative: :greater,
  confidence_level: 0.95,
  effect_size: true
)
```

Visualize paired data as differences when the pairing is the central question:

```elixir
paired_rows =
  before
  |> Enum.zip(after_values)
  |> Enum.with_index(1)
  |> Enum.map(fn {{before_value, after_value}, subject} ->
    %{subject: subject, difference: before_value - after_value}
  end)

Visualization.point_plot(paired_rows,
  x: :subject,
  y: :difference,
  stat: :mean
)
|> Visualization.with_style(width: 520, height: 260)
|> Visualization.show()
```

## Independent T-Test

Prefer Welch's independent t-test by default:

```elixir
TTest.independent(control, treatment,
  variance: :welch,
  alternative: :two_sided,
  confidence_level: 0.95,
  effect_size: true
)
```

Use the pooled variant only when equal variances are an intentional assumption:

```elixir
TTest.independent(control, treatment,
  variance: :pooled,
  effect_size: true
)
```

Use `null_difference:` when the comparison is against a non-zero difference:

```elixir
TTest.independent(control, treatment,
  variance: :welch,
  null_difference: -1.0
)
```

Annotate an ordinary plot with a computed t-test:

```elixir
rows
|> Visualization.box_plot(x: :group, y: :score)
|> Visualization.with_test(:t_test, groups: {:control, :treated})
|> Visualization.with_style(width: 420, height: 260)
|> Visualization.show()
```

## Mann-Whitney U Test

Use Mann-Whitney for two independent groups when a rank-based comparison is more
appropriate than a mean comparison.

```elixir
MannWhitney.test(ordinal_control, ordinal_treatment,
  alternative: :two_sided,
  method: :auto,
  continuity: true
)
```

Choose the method deliberately:

* `method: :auto` uses exact p-values when there are no ties and the smaller
  sample has at most 8 observations; otherwise it uses the asymptotic
  approximation.
* `method: :exact` computes exact p-values. Like SciPy, explicit exact mode does
  not apply tie correction.
* `method: :asymptotic` uses the normal approximation and optional continuity
  correction.

```elixir
MannWhitney.test([1, 3, 5], [2, 4, 6], method: :exact)
```

```elixir
MannWhitney.test(ordinal_control, ordinal_treatment,
  method: :asymptotic,
  continuity: false
)
```

Mann-Whitney results include rank-based effect sizes:

```elixir
MannWhitney.test(ordinal_control, ordinal_treatment).effect_size
```

Annotate a plot with a computed Mann-Whitney test:

```elixir
rows
|> Visualization.box_plot(x: :group, y: :score)
|> Visualization.with_test(:mann_whitney, groups: {:control, :treated})
|> Visualization.with_style(width: 420, height: 260)
|> Visualization.show()
```

## Faceted Computed Tests

When a plot is faceted, computed test annotations run independently inside each
facet panel.

```elixir
rows
|> Visualization.box_plot(x: :group, y: :score, facet: :site)
|> Visualization.with_test(:t_test, groups: {:control, :treated}, show: [:p_value])
|> Visualization.with_style(width: 260, height: 220)
|> Visualization.show()
```

```elixir
rows
|> Visualization.box_plot(x: :group, y: :score, facet: :site)
|> Visualization.with_test(:mann_whitney, groups: {:control, :treated}, show: [:p_value])
|> Visualization.with_style(width: 260, height: 220)
|> Visualization.show()
```

## Dataframe-Style Inputs

Maps of columns and dataframe-like values can be passed directly with
`columns:` or `pairs:`.

```elixir
TTest.one_sample(df, columns: [:baseline, :before], mean: 10.0)
```

```elixir
TTest.paired(df, columns: [:before, :after])
```

```elixir
TTest.independent(df, columns: [:control, :treatment], variance: :welch)
```

```elixir
MannWhitney.test(df, columns: [:ordinal_control, :ordinal_treatment], method: :auto)
```

Run several two-sample tests at once with `pairs:`:

```elixir
TTest.independent(df,
  pairs: [
    control: :treatment,
    before: :after
  ],
  variance: :welch
)
```

```elixir
MannWhitney.test(df,
  pairs: [
    ordinal_control: :ordinal_treatment,
    control: :treatment
  ],
  method: :auto
)
```

## Missing Values And NaN Policy

Statwise follows explicit NaN handling:

* `nan_policy: :raise` rejects NaN values.
* `nan_policy: :propagate` returns NaN-like results when NaNs are present.
* `nan_policy: :omit` removes NaNs before testing where supported.

```elixir
sample_with_nan = [1.0, 2.0, :nan, 3.0]

TTest.one_sample(sample_with_nan,
  mean: 2.0,
  nan_policy: :omit
)
```

```elixir
TTest.one_sample(sample_with_nan,
  mean: 2.0,
  nan_policy: :propagate
)
```

## Rank Utilities

Ranks use average-rank handling for ties, matching SciPy behavior.

```elixir
Statwise.Nonparametric.Rank.ranks([10, 20, 20, 30])
```

Rank plots are useful when inspecting nonparametric comparisons:

```elixir
Visualization.rank_plot(ordinal_control, ordinal_treatment,
  x_label: :control,
  y_label: :treated,
  title: "Average Ranks"
)
|> Visualization.with_style(width: 420, height: 260)
|> Visualization.show()
```

## Practical Checklist

Before choosing a test, ask:

* Is the comparison one sample against a reference, paired, or independent?
* Is the target a mean difference, or is a rank-based comparison more
  appropriate?
* Is the alternative directional (`:greater` or `:less`) or two-sided?
* Are missing values expected, and should they raise, propagate, or be omitted?
* Is an effect size needed for interpretation?
* Should the result be inspected directly or annotated on the plot where the
  comparison is visible?