Skip to main content

docs/visualization_roadmap.md

# Visualization Roadmap

This roadmap sketches how `Statwise.Visualization` can grow toward a mature,
seaborn-inspired statistical visualization API while staying idiomatic Elixir
and keeping renderer dependencies optional.

## North Star

The long-term goal is a small statistical plotting grammar:

```elixir
rows
|> Statwise.Visualization.plot(x: :treatment, y: :score, color: :site)
|> Statwise.Visualization.add(:box_plot)
|> Statwise.Visualization.facet(column: :site)
|> Statwise.Visualization.with_theme(:minimal)
|> Statwise.Visualization.show()
```

Statwise should continue to support simple direct constructors:

```elixir
Statwise.Visualization.box_plot(rows, x: :treatment, y: :score, facet: :site)
```

The direct constructors should remain easy for common use, while the grammar
API can support composition and advanced charts.

## Design Principles

- Keep chart content separate from presentation.
- Keep runtime visualization dependencies optional.
- Prefer tidy row data and semantic mappings.
- Export plain Vega-Lite-compatible maps as the stable renderer contract.
- Test generated specs and statistical transformations, not screenshots.
- Preserve existing APIs through aliases or a deprecation period.

## Phase 1: Normalize Semantic Mappings

Make every chart accept consistent field mappings.

Current shape:

```elixir
Statwise.Visualization.box_plot(rows,
  value: :score,
  group: :treatment,
  facet: :site
)
```

Target shape:

```elixir
Statwise.Visualization.box_plot(rows,
  x: :treatment,
  y: :score,
  color: :treatment,
  facet: :site
)
```

Semantic channels to support:

- `:x`
- `:y`
- `:color`
- `:facet`
- `:row`
- `:column`
- `:size`
- `:shape`
- `:detail`
- `:tooltip`

Compatibility aliases:

- `value: :score` maps to `y: :score`
- `group: :treatment` maps to `x: :treatment`
- `facet: :site` maps to a column/wrapped facet

Deliverables:

- Add `Statwise.Visualization.Mapping`
- Normalize aliases into semantic channels
- Share row extraction across plot types
- Support atom and string map keys
- Preserve old API behavior
- Add tests for old and new option names

## Phase 2: Make Row Data First-Class

Seaborn works best with tidy data. Statwise should make tidy rows the primary
shape while still supporting lists and maps of columns.

Supported inputs:

```elixir
[%{group: :a, value: 1.2}]
%{group: [:a, :b], value: [1.2, 2.4]}
Explorer.DataFrame
```

Potential internal representation:

```elixir
%Statwise.Visualization.Dataset{
  rows: [%{}],
  fields: %{...},
  source: :rows | :columns | :explorer
}
```

Conversion APIs:

```elixir
Statwise.Visualization.Dataset.from_rows(rows)
Statwise.Visualization.Dataset.from_columns(columns)
Statwise.Visualization.Dataset.from_explorer(df)
```

Explorer should remain optional and be detected with `Code.ensure_loaded?/1`.

Deliverables:

- Direct Explorer support using `Explorer.DataFrame.to_rows/2`
- Map-of-columns support
- Row validation
- Shared missing-value policy
- Dataset tests

## Phase 3: Expand Core Plot Types

Seaborn organizes plots into relational, distribution, categorical, regression,
and matrix families. Statwise should prioritize statistical usefulness.

Relational plots:

```elixir
Statwise.Visualization.scatter(data, x: :height, y: :weight)
Statwise.Visualization.line(data, x: :time, y: :value)
```

Distribution plots:

```elixir
Statwise.Visualization.histogram(data, x: :score)
Statwise.Visualization.ecdf(data, x: :score)
Statwise.Visualization.density(data, x: :score)
Statwise.Visualization.qq_plot(data, x: :score)
```

Categorical plots:

```elixir
Statwise.Visualization.box_plot(data, x: :group, y: :score)
Statwise.Visualization.violin_plot(data, x: :group, y: :score)
Statwise.Visualization.strip_plot(data, x: :group, y: :score)
Statwise.Visualization.swarm_plot(data, x: :group, y: :score)
Statwise.Visualization.bar_plot(data, x: :group, y: :score, stat: :mean)
Statwise.Visualization.point_plot(data, x: :group, y: :score, interval: :confidence)
Statwise.Visualization.count_plot(data, x: :category)
```

Matrix plots:

```elixir
Statwise.Visualization.heatmap(matrix)
Statwise.Visualization.correlation_heatmap(data, columns: [:a, :b, :c])
```

Recommended initial additions:

- `scatter/2`
- `line/2`
- `bar_plot/2`
- `count_plot/2`
- `strip_plot/2`
- `heatmap/2`

Defer until the statistical transformation story is clear:

- `density/2`
- `violin_plot/2`
- `swarm_plot/2`

## Phase 4: Improve Faceting

Current support:

```elixir
facet: :site
facet_columns: 2
```

Target support:

```elixir
facet: :site
facet: [column: :site]
facet: [row: :sex, column: :site]
columns: 3
share_x: true
share_y: false
```

Potential internal representation:

```elixir
%{
  row: channel | nil,
  column: channel | nil,
  columns: integer | nil,
  share_x: boolean,
  share_y: boolean
}
```

Deliverables:

- Row facets
- Column facets
- Wrapped facets
- Shared-axis controls through Vega-Lite `resolve`
- Livebook examples

## Phase 5: Style And Theme System

The current `with_style/2` supports friendly style keys. Make it more powerful
and more Vega-Lite-native.

Theme presets:

```elixir
Statwise.Visualization.with_theme(plot, :default)
Statwise.Visualization.with_theme(plot, :minimal)
Statwise.Visualization.with_theme(plot, :paper)
Statwise.Visualization.with_theme(plot, :dark)
Statwise.Visualization.with_theme(plot, :livebook)
```

Palette support:

```elixir
Statwise.Visualization.with_palette(plot, :category10)
Statwise.Visualization.with_palette(plot, ["#2563eb", "#dc2626", "#16a34a"])
```

Vega-Lite escape hatches:

```elixir
Statwise.Visualization.with_style(plot,
  vega_lite: [...],
  mark: [...],
  encoding: [...],
  facet: [...],
  spec: [...],
  config: [...]
)
```

Precedence rules should be explicit:

1. Plot defaults
2. Theme
3. Attached style
4. Export-time style

Deliverables:

- Add `Statwise.Visualization.Theme`
- Add `Statwise.Visualization.Palette`
- Add full Vega-Lite pass-through support
- Document merge precedence
- Add tests for faceted and layered style routing

## Phase 6: Plot Object And Composition API

This is the seaborn objects-inspired layer.

Target API:

```elixir
Statwise.Visualization.plot(rows, x: :score, color: :group)
|> Statwise.Visualization.add(:histogram, bins: 20)
|> Statwise.Visualization.add(:rug)
|> Statwise.Visualization.facet(column: :site)
|> Statwise.Visualization.label(title: "Scores by Site")
|> Statwise.Visualization.show()
```

Potential structs:

```elixir
%Statwise.Visualization.Figure{
  data: dataset,
  mappings: %{x: :score, y: nil, color: :group},
  layers: [%Statwise.Visualization.Layer{}],
  facet: nil,
  labels: %{},
  theme: nil,
  style: %{}
}
```

Layer examples:

```elixir
add(:point)
add(:line)
add(:bar)
add(:box_plot)
add(:histogram)
add(:rule)
```

Deliverables:

- `plot/2`
- `add/3`
- `facet/2`
- `label/2`
- `show/1`
- Vega-Lite conversion for layered and faceted figures

This should happen after the direct constructors and semantic mappings are
stable.

## Phase 7: Statistical Summaries And Intervals

Implemented: add seaborn-like estimate plots that compute summaries.

Examples:

```elixir
Statwise.Visualization.bar_plot(data, x: :group, y: :score, stat: :mean)

Statwise.Visualization.point_plot(data,
  x: :group,
  y: :score,
  stat: :mean,
  interval: :confidence,
  confidence_level: 0.95
)
```

Supported summaries:

- `:count`
- `:mean`
- `:median`
- `:sum`

Supported intervals:

- `nil`
- `:standard_error`
- `:confidence`
- `:percentile`

Deliverables:

- Add `Statwise.Visualization.Summary`
- Grouped summaries
- Confidence, standard-error, and percentile intervals
- `point_plot/2`
- Optional bootstrap intervals later
- Tests for summary correctness

## Phase 8: Statistical Result Annotations

Implemented. Result-specific plots remain available for direct inspection:

```elixir
Statwise.Visualization.t_test(result)
Statwise.Visualization.mann_whitney(result)
Statwise.Visualization.confidence_interval(result)
```

The primary workflow now shows statistical results directly on ordinary plots,
for example a box plot with a comparison bracket and p-value/effect-size
annotation:

```elixir
rows
|> Statwise.Visualization.box_plot(x: :group, y: :score)
|> Statwise.Visualization.with_test(result, groups: {:control, :treated})
```

Tests can also be computed from the plotted rows. When the plot is faceted, the
test is computed independently inside each facet panel:

```elixir
rows
|> Statwise.Visualization.box_plot(x: :group, y: :score, facet: :site)
|> Statwise.Visualization.with_test(:mann_whitney, groups: {:control, :treated})
```

Deliverables:

- Completed: test-result annotation data model
- Completed: comparison brackets for categorical plots
- Completed: p-value, statistic, and effect-size labels for t-test and
  Mann-Whitney results
- Completed: per-facet test computation when tests are computed from plotted
  rows
- Completed: facet-aware annotation placement

Future extensions:

- Optional confidence interval overlays from `%Statwise.TestResult{}`

## Phase 9: Documentation And Gallery

The Livebook should become the canonical tutorial.

Recommended structure:

1. Quickstart
2. Data shapes
3. Semantic mappings
4. Distribution plots
5. Categorical plots
6. Relational plots
7. Faceting
8. Styling and themes
9. Statistical result plots
10. Exporting
11. Vega-Lite escape hatches

Also keep the README compact:

```elixir
df
|> Statwise.Visualization.box_plot(x: :treatment, y: :score, facet: :site)
|> Statwise.Visualization.show()
```

Deliverables:

- Keep `docs/visualization_gallery.livemd` current
- Update `docs/visualization.md`
- Add small README examples
- Add generated Vega-Lite examples in tests

## Phase 10: Compatibility And Stability

Before calling the visualization API mature:

- Keep no required visualization runtime dependency.
- Keep `VegaLite`, `Kino`, `Jason`, and `Explorer` optional.
- Keep old APIs through aliases or a deprecation period.
- Add changelog entries.
- Ensure chart constructors return plain `%Plot{}` or `%Figure{}` structs.
- Test generated Vega-Lite specs, not screenshots.

## Recommended Implementation Order

1. Semantic mappings: `x`, `y`, `color`, `facet`
2. Dataset normalization for rows, columns, and Explorer
3. Scatter, line, bar, count, and strip plots
4. Better faceting: row/column facets and shared axes
5. Vega-Lite escape hatches in `with_style/2`
6. Themes and palettes
7. Statistical summary plots
8. Composition API: `plot |> add |> facet`
9. Matrix and correlation heatmaps
10. Violin, density, and swarm plots once transformations are solid

The highest-leverage starting point is Phase 1 plus Phase 2. A seaborn-like API
lives or dies by tidy data and semantic mappings. Once those are clean, every
new chart becomes easier to add.