Skip to main content

docs/ecto-analytical-coverage.md

# Ecto coverage

QuackDB's Ecto adapter covers analytical reads plus common write and setup workflows. DuckDB's SQL surface is larger than Ecto's native AST, so coverage is tracked in three buckets:

- **Ecto-native**: expressible with normal `Ecto.Query`, schema, repo, or migration APIs and generated by the adapter.
- **Helper/source**: expressible through `QuackDB.Ecto.Analytics`, `QuackDB.Ecto.Spatial`, Ecto `fragment/1`, or QuackDB source helpers.
- **Raw SQL**: DuckDB-native syntax that should be exercised through `Repo.query/3` or `QuackDB.query/4`.

This file is a roadmap, not a claim of complete DuckDB support.

## Coverage matrix

| Area | Feature | Path | SQL generation | Real server | Status |
| --- | --- | --- | --- | --- | --- |
| Basic reads | select/where/order/limit/offset | Ecto-native | yes | yes | covered |
| Schema reads | full schema select / `Repo.get!/2` | Ecto-native | yes | yes | covered |
| Parameters | pinned params and raw params | Ecto-native/raw | yes | yes | covered |
| Joins | inner/left/right/full/cross | Ecto-native | yes | yes | covered |
| Aggregates | count/sum/avg/min/max/count distinct/coalesce | Ecto-native | yes | yes | covered |
| Analytical aggregates | median/quantile/list/string_agg/arg_max/arg_min/mode/weighted_avg/favg/fsum/product/statistics/histograms | Helper | yes | yes | covered |
| Aggregate filter | `FILTER (WHERE ...)` | Ecto-native | yes | yes | covered |
| Grouping | group by/having | Ecto-native | yes | yes | covered |
| CTEs | non-recursive CTEs | Ecto-native | yes | yes | covered |
| Windows | row_number/rank/dense_rank/percent_rank/cume_dist | Ecto-native | yes | partial | partial |
| Windows | lag/lead/first_value/last_value/nth_value | Ecto-native/helper | yes | partial | partial |
| Windows | fragment-backed frame clauses | Ecto-native/fragment | yes | yes | covered |
| Sources | CSV/Parquet helpers | Source helpers | yes | yes | covered |
| Sources | JSON/XLSX helpers | Source helpers/raw | partial | partial | partial |
| Source analytics | source helper + aggregate/window | Ecto-native/source | yes | partial | partial |
| Nested analytics | list/struct/map functions | Fragment/raw | partial | no | missing |
| JSON analytics | json_extract/path queries | Helper/source/raw | yes | yes | partial |
| Regular expressions | regexp_extract/matches/replace/split helpers | Helper | yes | yes | covered |
| Text helpers | contains/starts_with/split_part/string_split | Helper | yes | yes | covered |
| Time series | date_trunc/time_bucket/generate_series | Helper/raw | yes | yes | partial |
| Grouping extensions | grouping sets/rollup/cube | Raw SQL | no | yes | partial |
| QUALIFY | window filtering | Ecto subquery or raw SQL | yes | yes | partial |
| Pivoting | pivot/unpivot | Raw SQL | no | yes | partial |
| Sampling | using sample | Raw SQL | no | yes | partial |
| Set operations | union/intersect/except | Ecto combinations | yes | yes | covered |
| Inserts | `Repo.insert/2`, `Repo.insert_all/3` | Ecto-native | yes | yes | covered |
| Upserts | `DO NOTHING`, `set`, `inc`, replacement fields | Ecto-native | yes | yes | covered |
| Native append | `insert_all(..., insert_method: :append)` | Adapter option | yes | yes | covered |
| Mutations | `update_all` / `delete_all` | Ecto-native | yes | yes | covered |
| Schema lifecycle | `Repo.update/2` / `Repo.delete/2` | Ecto-native | yes | yes | covered |
| Migrations | create/drop/alter table, rename table/column, indexes, references | Ecto migration DDL | yes | yes | covered |
| Explain | `Ecto.Adapters.SQL.explain/4` | Ecto SQL | yes | yes | covered |
| Full-text search | BM25 ranking and stemming | Ecto helper fragments | yes | yes | covered |
| Advanced joins | semi/anti/asof/positional | Raw SQL | no | no | missing |
| DuckDB select extensions | `* EXCLUDE`, `* REPLACE`, `COLUMNS(*)` | Raw SQL | no | no | missing |
| Introspection | summarize/describe/pragma | Direct SQL helper/raw | partial | partial | partial |

## Migration boundaries

Basic migration DDL is generated for table creation/drop, column add/drop/modify, table and column renames, references, primary keys, composite primary keys, and ordinary/unique indexes. DuckDB-incompatible index options such as concurrent indexes, covering indexes, raw index options, index comments, custom `USING`, and `nulls_distinct` raise explicit QuackDB errors instead of being ignored.

Advanced constraints and comments should be added only where DuckDB can enforce the same semantics. Until then, prefer raw SQL for DuckDB-specific DDL.

## Test organization

Coverage should stay split by expression path:

```text
test/quack_db/ecto/sql_generation/
  analytical_test.exs
  aggregates_test.exs
  fragments_test.exs
  migration_test.exs
  source_analytics_test.exs
  sources_test.exs
  update_delete_test.exs
  window_functions_test.exs

test/quack_db/integration/ecto/
  migration_test.exs
  query_test.exs
```

Use SQL generation tests to pin what QuackDB emits for Ecto-native queries. Use real-server tests for DuckDB-native semantics and raw SQL pass-through.

## QUALIFY-style filters

Ecto does not have a `QUALIFY` clause, but the common top-N-per-group pattern is expressible with a subquery:

```elixir
ranked =
  from event in "events",
    windows: [by_category: [partition_by: event.category, order_by: [desc: event.score]]],
    select: %{
      id: event.id,
      category: event.category,
      rank: over(row_number(), :by_category)
    }

from event in subquery(ranked),
  where: event.rank <= 3,
  order_by: [event.category, event.rank],
  select: %{id: event.id, rank: event.rank}
```

Use raw SQL when DuckDB-specific `QUALIFY` syntax is clearer than the Ecto subquery shape.

## Query style

Prefer ordinary Ecto syntax when it maps cleanly to DuckDB SQL:

- comparisons and boolean logic with Elixir/Ecto operators;
- conditional aggregates with `filter(count(...), predicate)` instead of adapter-specific `*_if` helpers;
- `selected_as/2` for grouped aliases;
- `type/2` for JSON/text casts.

Use `QuackDB.Ecto.Analytics` for DuckDB analytical functions that are established SQL vocabulary, such as `median/1`, `quantile_cont/2`, `list/1,2`, `weighted_avg/2`, `fsum/1`, `time_bucket/2,3`, JSON path helpers, and query-level profiling with `summarize/3` or `summarize!/3`.

Use `QuackDB.Ecto.Regex` for DuckDB's `regexp_*` expression functions. DuckDB uses RE2 while Elixir `Regex` uses Erlang/OTP `:re`, so literal `~r/.../` patterns are convenient only for the shared syntax subset. QuackDB translates compatible `~r` modifiers (`i`, `m`, and `s`) into DuckDB option strings, ignores Elixir's Unicode modifier, and rejects modifiers DuckDB cannot represent.

Use `QuackDB.Ecto.Text` for common text predicates and splitting functions (`contains/2`, `starts_with/2`, `ends_with/2`, `split_part/3`, `string_split/2`) when they read better than fragments. With `use QuackDB.Ecto`, shared `contains/2` routes obvious text calls to DuckDB `contains` and spatial helper expressions to `ST_Contains`; ambiguous calls raise so use `contains_text/2` or `st_contains/2` when you want to be explicit.

Keep raw SQL for syntax Ecto cannot represent well, including `PIVOT`, `UNPIVOT`, `QUALIFY`, `GROUPING SETS`, `ROLLUP`, and `CUBE`. Window frames should use `fragment(...)` until QuackDB depends on an Ecto release that includes macro-expanded frame helper support.

## Boundaries

QuackDB should not try to reimplement all DuckDB syntax as Ecto macros. For DuckDB-specific syntax that Ecto cannot represent cleanly, prefer:

1. raw SQL through `Repo.query/3` or `QuackDB.query/4`;
2. `QuackDB.Ecto.Analytics` for common DuckDB analytical expressions;
3. `QuackDB.Ecto.Spatial` for common spatial expressions;
4. source helpers for table functions such as CSV/Parquet/JSON;
5. `fragment/1` for expressions inside otherwise-normal Ecto queries;
6. explicit unsupported errors for Ecto AST shapes that would generate misleading SQL.

Future adapter-specific helpers may make sense for repeated patterns such as `QUALIFY`, lakehouse sources, or Arrow handoff, but those should be added only after the protocol and result semantics are stable.