docs/ecto-analytical-coverage.md

Select File
docs/ecto-analytical-coverage.md

# Ecto coverage

QuackDB's Ecto adapter covers analytical reads plus common write and setup workflows. DuckDB's SQL surface is larger than Ecto's native AST, so coverage is tracked in three buckets:

- **Ecto-native**: expressible with normal `Ecto.Query`, schema, repo, or migration APIs and generated by the adapter.
- **Helper/source**: expressible through `QuackDB.Ecto.Analytics`, `QuackDB.Ecto.Spatial`, Ecto `fragment/1`, or QuackDB source helpers.
- **Raw SQL**: DuckDB-native syntax that should be exercised through `Repo.query/3` or `QuackDB.query/4`.

This file is a roadmap, not a claim of complete DuckDB support.

## Coverage matrix

| Area | Feature | Path | SQL generation | Real server | Status |
| --- | --- | --- | --- | --- | --- |
| Basic reads | select/where/order/limit/offset | Ecto-native | yes | yes | covered |
| Schema reads | full schema select / `Repo.get!/2` | Ecto-native | yes | yes | covered |
| Parameters | pinned params and raw params | Ecto-native/raw | yes | yes | covered |
| Joins | inner/left/right/full/cross | Ecto-native | yes | yes | covered |
| Aggregates | count/sum/avg/min/max/count distinct/coalesce | Ecto-native | yes | yes | covered |
| Analytical aggregates | median/quantile/list/string_agg/arg_max/arg_min/mode/weighted_avg/favg/fsum/product/statistics/histograms | Helper | yes | yes | covered |
| Aggregate filter | `FILTER (WHERE ...)` | Ecto-native | yes | yes | covered |
| Grouping | group by/having | Ecto-native | yes | yes | covered |
| CTEs | non-recursive CTEs | Ecto-native | yes | yes | covered |
| Windows | row_number/rank/dense_rank/percent_rank/cume_dist | Ecto-native | yes | partial | partial |
| Windows | lag/lead/first_value/last_value/nth_value | Ecto-native/helper | yes | partial | partial |
| Windows | fragment-backed frame clauses | Ecto-native/fragment | yes | yes | covered |
| Sources | CSV/Parquet helpers | Source helpers | yes | yes | covered |
| Sources | JSON/XLSX helpers | Source helpers/raw | partial | partial | partial |
| Source analytics | source helper + aggregate/window | Ecto-native/source | yes | partial | partial |
| Nested analytics | common LIST/MAP/STRUCT helpers | Helper/fragment/raw | partial | partial | partial |
| JSON analytics | json_extract/path queries | Helper/source/raw | yes | yes | partial |
| Regular expressions | regexp_extract/matches/replace/split helpers | Helper | yes | yes | covered |
| Text helpers | contains/starts_with/split_part/string_split | Helper | yes | yes | covered |
| Time series | date_trunc/time_bucket/generate_series | Helper/raw | yes | yes | partial |
| Grouping extensions | grouping sets/rollup/cube | Raw SQL helper | yes | yes | partial |
| QUALIFY | window filtering | Ecto subquery or raw SQL | yes | yes | partial |
| Pivoting | pivot/unpivot | Raw SQL helper | yes | yes | partial |
| Sampling | using sample | Raw SQL | no | yes | partial |
| Set operations | union/intersect/except | Ecto combinations | yes | yes | covered |
| Inserts | `Repo.insert/2`, `Repo.insert_all/3` | Ecto-native | yes | yes | covered |
| Upserts | `DO NOTHING`, `set`, `inc`, replacement fields | Ecto-native | yes | yes | covered |
| Native append | `insert_all(..., insert_method: :append)` | Adapter option | yes | yes | covered |
| Native append | schema types, subset columns/defaults, `RETURNING` | Adapter option | yes | partial | covered |
| Mutations | `update_all` / `delete_all` | Ecto-native | yes | yes | covered |
| Schema lifecycle | `Repo.update/2` / `Repo.delete/2` | Ecto-native | yes | yes | covered |
| Migrations | create/drop/alter table, rename table/column, indexes, references | Ecto migration DDL | yes | yes | covered |
| Explain | `Ecto.Adapters.SQL.explain/4` | Ecto SQL | yes | yes | covered |
| Full-text search | BM25 ranking and stemming | Ecto helper fragments | yes | yes | covered |
| Advanced joins | semi/anti via `exists`, ASOF-style lateral top-one, positional raw SQL | Ecto-native/raw | yes | yes | partial |
| DuckDB select extensions | `* EXCLUDE`, `* REPLACE`, `* RENAME`, pattern stars, `COLUMNS(...)`, `*COLUMNS(...)` | Raw SQL helper | yes | yes | partial |
| Introspection | summarize/describe/pragma | Direct SQL helper/raw | partial | partial | partial |

## Migration boundaries

Basic migration DDL is generated for table creation/drop, column add/drop/modify, table and column renames, references, primary keys, composite primary keys, and ordinary/unique indexes. DuckDB-incompatible index options such as concurrent indexes, covering indexes, raw index options, index comments, custom `USING`, and `nulls_distinct` raise explicit QuackDB errors instead of being ignored.

Advanced constraints and comments should be added only where DuckDB can enforce the same semantics. Until then, prefer raw SQL for DuckDB-specific DDL.

## Test organization

Coverage should stay split by expression path:

```text
test/quack_db/ecto/sql_generation/
  analytical_test.exs
  aggregates_test.exs
  fragments_test.exs
  migration_test.exs
  source_analytics_test.exs
  sources_test.exs
  update_delete_test.exs
  window_functions_test.exs

test/quack_db/integration/ecto/
  migration_test.exs
  query_test.exs
```

Use SQL generation tests to pin what QuackDB emits for Ecto-native queries. Use real-server tests for DuckDB-native semantics and raw SQL pass-through.

## QUALIFY-style filters

Ecto does not have a `QUALIFY` clause, but the common top-N-per-group pattern is expressible with a subquery:

```elixir
ranked =
  from event in "events",
    windows: [by_category: [partition_by: event.category, order_by: [desc: event.score]]],
    select: %{
      id: event.id,
      category: event.category,
      rank: over(row_number(), :by_category)
    }

from event in subquery(ranked),
  where: event.rank <= 3,
  order_by: [event.category, event.rank],
  select: %{id: event.id, rank: event.rank}
```

Use raw SQL when DuckDB-specific `QUALIFY` syntax is clearer than the Ecto subquery shape.

## Advanced join patterns

DuckDB exposes advanced join syntax such as `SEMI JOIN`, `ANTI JOIN`, `ASOF JOIN`, and `POSITIONAL JOIN`. Ecto's join qualifiers are intentionally narrower, so prefer ordinary Ecto shapes when they preserve the same semantics.

Use `exists/1` for semi-join semantics: keep left-side rows when a related row exists, without duplicating the left rows or projecting right-side columns.

```elixir
from event in "events",
  as: :event,
  where:
    exists(
      from category in "categories",
        where: category.id == parent_as(:event).category_id,
        select: 1
    ),
  select: event.name
```

Use `not exists/1` for anti-join semantics: keep left-side rows when no related row exists.

```elixir
from event in "events",
  as: :event,
  where:
    not exists(
      from category in "categories",
        where: category.id == parent_as(:event).category_id,
        select: 1
    ),
  select: event.name
```

Use a lateral top-one subquery for ASOF-style "latest matching row at or before this timestamp" queries. This keeps the query Ecto-shaped while matching the common analytical ASOF use case.

```elixir
latest_price =
  from price in "prices",
    where:
      price.symbol == parent_as(:trade).symbol and
        price.ts <= parent_as(:trade).ts,
    order_by: [desc: price.ts],
    limit: 1,
    select: %{price: price.price}

from trade in "trades",
  as: :trade,
  left_lateral_join: price in subquery(latest_price),
  on: true,
  select: %{trade_id: trade.id, price: price.price}
```

Keep `POSITIONAL JOIN` as raw SQL. It joins by row order rather than relational predicates, so representing it as an Ecto join would be surprising.

```elixir
Repo.query!("""
SELECT *
FROM read_csv('ids.csv')
POSITIONAL JOIN read_csv('labels.csv')
""")
```

## DuckDB star and columns expressions

DuckDB's star expression extensions are useful for exploratory analytics and wide tables, but they do not always fit Ecto's ordinary select result loading. Use `QuackDB.SQL` expression helpers when the query is clearer as DuckDB SQL.

```elixir
star = QuackDB.SQL.star(exclude: [:payload, :debug])

Repo.query!([
  "SELECT ",
  star,
  " FROM events"
])
```

`QuackDB.Ecto.Star` also exposes `star/1`, `columns/1,2`, and `unpack_columns/1,2` macros for Ecto SQL generation. These macros are imported by `use QuackDB.Ecto`.

```elixir
use QuackDB.Ecto

from event in "events",
  where: columns([:score]) > 0,
  select: event.id
```

`star/1` supports table-qualified stars, exclusion, replacement, rename, and one pattern filter.

```elixir
QuackDB.SQL.star(
  qualifier: :events,
  replace: [score: {:expr, "coalesce(score, 0)"}],
  rename: [old_name: :name]
)

QuackDB.SQL.star(like: "metric_%")
```

Use `columns/1,2` for DuckDB `COLUMNS(...)` expressions and `unpack_columns/1,2` for `*COLUMNS(...)`. Literal and pinned atom lists match Ecto's `map/2` and `struct/2` field-list style. Pinned selectors are supported where DuckDB accepts parameterized `COLUMNS(?)` selectors.

```elixir
fields = [:score, :temperature]

from event in "events",
  where: columns(^fields) > 0,
  select: event.id

Repo.query!([
  "SELECT min(",
  QuackDB.SQL.columns(exclude: [:payload]),
  ") FROM events"
])

Repo.query!([
  "SELECT ",
  QuackDB.SQL.unpack_columns("^metric_"),
  " FROM events"
])
```

Replacement expressions are explicit `{:expr, sql}` values because they are DuckDB SQL snippets. Prefer ordinary Ecto selects when the selected columns are known and not using DuckDB star syntax.

DuckDB can expand one star or `COLUMNS(...)` expression into multiple result columns. That is safe for raw `Repo.query!/2`, `Ecto.Adapters.SQL.to_sql/3`, and predicates that keep the outer select shape normal. Be careful with ordinary `Repo.all/2` select lists that use expanding star expressions, because Ecto's result loader expects the selected Ecto expression shape to match the returned columns. Dynamic pinned selectors are supported for `columns(^fields)` and `unpack_columns(^fields)`, but not for star options such as `exclude: ^fields` because DuckDB requires those names directly in the star syntax.

## Query style

Prefer ordinary Ecto syntax when it maps cleanly to DuckDB SQL:

- comparisons and boolean logic with Elixir/Ecto operators;
- conditional aggregates with `filter(count(...), predicate)` instead of adapter-specific `*_if` helpers;
- `selected_as/2` for grouped aliases;
- `type/2` for JSON/text casts.

Use `QuackDB.Ecto.Analytics` for DuckDB analytical functions that are established SQL vocabulary, such as `median/1`, `quantile_cont/2`, `list/1,2`, `weighted_avg/2`, `fsum/1`, `time_bucket/2,3`, JSON path helpers, and query-level profiling with `summarize/3` or `summarize!/3`.

Use `QuackDB.Ecto.Regex` for DuckDB's `regexp_*` expression functions. DuckDB uses RE2 while Elixir `Regex` uses Erlang/OTP `:re`, so literal `~r/.../` patterns are convenient only for the shared syntax subset. QuackDB translates compatible `~r` modifiers (`i`, `m`, and `s`) into DuckDB option strings, ignores Elixir's Unicode modifier, and rejects modifiers DuckDB cannot represent.

Use `QuackDB.Ecto.Text` for common text predicates and splitting functions (`contains/2`, `starts_with/2`, `ends_with/2`, `split_part/3`, `string_split/2`) when they read better than fragments. With `use QuackDB.Ecto`, shared `contains/2` routes obvious text calls to DuckDB `contains` and spatial helper expressions to `ST_Contains`; ambiguous calls raise so use `contains_text/2` or `st_contains/2` when you want to be explicit.

Use `QuackDB.Ecto.List` for common LIST/ARRAY operations such as `contains_list/2`, `has_any/2`, `has_all/2`, `list_length/1`, `extract/2`, `slice/3,4`, `sort/1`, `distinct/1`, `unique/1`, `position/2`, `intersect_list/2`, `concat/2`, `list_filter/2`, `list_transform/2`, `list_reduce/2,3`, and `unnest/1`. Names follow DuckDB concepts where they do not conflict with Ecto or Kernel imports; `list_length/1` and `intersect_list/2` are explicit to avoid those conflicts. Lambda helpers translate a constrained Elixir `fn` subset to DuckDB's Python-style `lambda x : ...` syntax and raise at macro expansion for unsupported expressions. Use `case_when do ... end` inside lambdas for multi-branch `CASE WHEN` expressions.

Use `QuackDB.Ecto.Map` and `QuackDB.Ecto.Struct` for common MAP/STRUCT operations. Focused imports expose natural helper names such as `contains/2`, `extract/2`, `values/1`, and `concat/2`. Broad `use QuackDB.Ecto` imports exclude those ambiguous names and expose explicit aliases such as `contains_map/2`, `map_extract_value/2`, `map_keys/1`, `contains_struct/2`, `struct_extract/2`, and `struct_values/1`.

Keep raw SQL for syntax Ecto cannot represent well, including `PIVOT`, `UNPIVOT`, `QUALIFY`, `GROUPING SETS`, `ROLLUP`, and `CUBE`. Window frames should use `fragment(...)` until QuackDB depends on an Ecto release that includes macro-expanded frame helper support.

`QuackDB.SQL` provides small builders for DuckDB statement/clause syntax that is clearer as SQL-builder composition than as Ecto AST.

```elixir
Repo.query!(QuackDB.SQL.pivot(:events,
  on: :kind,
  using: [sum: :n]
))

Repo.query!(QuackDB.SQL.unpivot(:wide_events,
  on: [:duck, :goose],
  name: :kind,
  value: :n
))
```

Grouping extensions keep identifier quoting centralized and can be composed into larger query builders or setup statements.

```elixir
QuackDB.SQL.grouping_sets([[:category, :kind], [:category], []])
QuackDB.SQL.rollup([:category, :kind])
QuackDB.SQL.cube([:category, :kind])
```

Use explicit `{:expr, sql}` only when the expression cannot be represented by identifiers or aggregate tuples.

## Boundaries

QuackDB should not try to reimplement all DuckDB syntax as Ecto macros. For DuckDB-specific syntax that Ecto cannot represent cleanly, prefer:

1. raw SQL through `Repo.query/3` or `QuackDB.query/4`;
2. `QuackDB.Ecto.Analytics` for common DuckDB analytical expressions;
3. `QuackDB.Ecto.Spatial` for common spatial expressions;
4. source helpers for table functions such as CSV/Parquet/JSON;
5. `fragment/1` for expressions inside otherwise-normal Ecto queries;
6. explicit unsupported errors for Ecto AST shapes that would generate misleading SQL.

Future adapter-specific helpers may make sense for repeated patterns such as `QUALIFY`, lakehouse sources, or Arrow handoff, but those should be added only after the protocol and result semantics are stable.