# Getting started
QuackDB connects Elixir applications to a remote DuckDB process through DuckDB's experimental Quack protocol. The client talks to the Quack HTTP endpoint, decodes DuckDB result chunks, and exposes the connection through `DBConnection`.
## Requirements
- Elixir 1.19 or newer
- DuckDB 1.5.3 or newer for the current Quack extension behavior
- A running Quack server
## Install
Add `:quackdb` to your dependencies:
```elixir
def deps do
[
{:quackdb, "~> 0.3.0"}
]
end
```
Optional integrations are compiled only when their packages are available. Add Explorer when you want dataframe handoff helpers:
```elixir
def deps do
[
{:quackdb, "~> 0.3.0"},
{:explorer, "~> 0.11"}
]
end
```
Then fetch dependencies:
```sh
mix deps.get
```
## Start DuckDB with Quack
For local development, QuackDB can supervise DuckDB's external CLI process for you:
```elixir
children =
QuackDB.Server.child_specs(
server: [name: MyApp.DuckDB, duckdb: :managed, endpoint: "quack:localhost:9494"],
client: [name: MyApp.QuackDB, pool_size: System.schedulers_online()]
)
```
`child_specs/1` generates one shared random token and injects the matching URI/token into both child specs. Pass `:token` on either side when you want to provide it yourself.
`duckdb: :managed` downloads and caches DuckDB's official CLI binary when the local server starts. Managed downloads are checksum-verified for `QuackDB.Binary.default_version/0`; other DuckDB versions require passing an explicit `:sha256`. For explicit setup, run:
```sh
mix quackdb.install
mix quackdb.install --print-path
```
Use `QUACKDB_BINARY_PATH` or pass `duckdb: "/path/to/duckdb"` when you want QuackDB to use a system or custom executable instead. See the [managed DuckDB guide](managed-duckdb.md) for cache, checksum, and target-prefetch options.
Or start DuckDB manually with the `quack` extension loaded:
```sh
duckdb -interactive -init /dev/null \
-cmd "LOAD quack; CALL quack_serve('quack:localhost', token='super_secret');"
```
On some systems, `quack:localhost` binds to IPv6 localhost. If `http://localhost:9494` does not connect, use `http://[::1]:9494`.
## Connect from Elixir
```elixir
{:ok, conn} =
QuackDB.start_link(
uri: "http://[::1]:9494",
token: "super_secret"
)
```
`QuackDB.start_link/1` starts a `DBConnection` process. You can pass the connection to `QuackDB.ping/2`, `QuackDB.query/4`, `QuackDB.prepare_execute/4`, `QuackDB.stream/4`, or DBConnection APIs.
## Run a query
```elixir
:ok = QuackDB.ping(conn)
{:ok, result} = QuackDB.query(conn, "SELECT 1 AS n")
result.columns
#=> ["n"]
result.rows
#=> [[1]]
result.num_rows
#=> 1
```
`rows` are row-oriented lists. This shape is convenient for DBConnection and future Ecto integration.
QuackDB formats positional parameters as DuckDB SQL literals client-side because the current Quack request path does not expose server-side bind parameters:
```elixir
{:ok, result} = QuackDB.query(conn, "SELECT ? AS name, ? AS n", ["duck", 42])
result.rows
#=> [["duck", 42]]
```
Placeholders inside strings and comments are ignored while formatting, and unsupported parameter values raise explicit errors.
## Decode nested values
DuckDB nested result types decode to ordinary Elixir values:
```elixir
{:ok, result} =
QuackDB.query(conn, """
SELECT
[1, 2, 3] AS xs,
{'name': 'duck', 'count': 2} AS obj,
array_value(1, 2, 3) AS arr,
map(['a', 'b'], [1, 2]) AS m,
[{'a': 1}, {'a': 2}] AS nested
""")
result.rows
#=> [
#=> [
#=> [1, 2, 3],
#=> %{"name" => "duck", "count" => 2},
#=> [1, 2, 3],
#=> %{"a" => 1, "b" => 2},
#=> [%{"a" => 1}, %{"a" => 2}]
#=> ]
#=> ]
```
## Query files and lakehouse sources
DuckDB can scan files, object stores, and lakehouse table formats directly. `QuackDB.Source` builds safe table-function fragments that can be used from Ecto queries or direct SQL:
```elixir
use QuackDB.Ecto
alias QuackDB.Source
source = Source.csv("events.csv", header: true)
MyApp.AnalyticsRepo.all(
from event in source,
where: event.id > 1,
select: %{id: event.id, name: event.name}
)
```
See [`guides/sources.md`](sources.md) for HTTP(S), S3/R2/GCS, Azure/ADLS, Hugging Face, Delta, Iceberg, extensions, and DuckDB secrets.
## Stream large result sets
`QuackDB.query/4` materializes the full result. QuackDB fetches additional result chunks when DuckDB reports that more rows are available, but for large analytical results prefer streaming helpers.
Use `QuackDB.stream/4` to process `%QuackDB.Result{}` batches lazily:
```elixir
row_count =
conn
|> QuackDB.stream("SELECT i FROM range(0, 50_000) t(i)")
|> Enum.reduce(0, fn result, count -> count + result.num_rows end)
row_count
#=> 50_000
```
Use `QuackDB.rows/4` for row-level streaming:
```elixir
conn
|> QuackDB.rows("SELECT i FROM range(0, ?) t(i)", [50_000])
|> Enum.take(3)
#=> [[0], [1], [2]]
```
Use `QuackDB.maps/4` for row maps keyed by column names. Duplicate column names are disambiguated with suffixes such as `_2` and `_3`:
```elixir
conn
|> QuackDB.maps("SELECT i AS n FROM range(0, ?) t(i)", [50_000])
|> Enum.take(2)
#=> [%{"n" => 0}, %{"n" => 1}]
```
Use `QuackDB.columnar/4` when an analytical workflow wants vectors plus column order and metadata:
```elixir
{:ok, columns} = QuackDB.columnar(conn, "SELECT id, name FROM events ORDER BY id")
columns.names
#=> ["id", "name"]
columns["id"]
#=> [1, 2]
```
`QuackDB.columns/4` returns just the column map:
```elixir
{:ok, columns} = QuackDB.columns(conn, "SELECT id, name FROM events ORDER BY id")
columns
#=> %{"id" => [1, 2], "name" => ["duck", "goose"]}
```
For large results, `QuackDB.columnar_batches/4` streams `QuackDB.Columns` fetch batches without materializing the whole result set. `QuackDB.column_batches/4` returns just the map from each batch:
```elixir
conn
|> QuackDB.column_batches("SELECT i AS n FROM range(0, 50_000) t(i)", [], max_rows: 1_000)
|> Enum.take(1)
#=> [%{"n" => [0, 1, 2, ...]}]
```
This is not Arrow IPC yet, but it exposes a column-oriented shape that can back future Arrow integration without changing the query API.
## Convert results to Explorer DataFrames
When `:explorer` is available, QuackDB exposes optional helpers for building `Explorer.DataFrame` values from query results:
Ecto queries can be passed directly when you already have schemas or source helpers:
```elixir
import Ecto.Query
alias QuackDB.Explorer, as: QuackExplorer
alias QuackDB.Source
source = Source.csv("events.csv", header: true)
query =
from event in source,
where: event.id > ^1,
select: %{id: event.id, name: event.name}
{:ok, df} = QuackExplorer.dataframe(conn, query)
```
The Explorer integration materializes query results in Elixir before constructing a dataframe. It is useful for interactive analysis and downstream Explorer pipelines, but it is not a zero-copy Arrow IPC path yet.
Explorer dataframes can also be appended through Quack's native column-oriented append path:
```elixir
alias QuackDB.Explorer, as: QuackExplorer
QuackExplorer.insert_dataframe!(conn, "events_copy", df, batch_size: 10_000)
```
You can also convert existing results:
```elixir
alias QuackDB.Explorer, as: QuackExplorer
{:ok, result} = QuackDB.query(conn, "SELECT 1 AS id, 'duck' AS name")
{:ok, df} = QuackExplorer.from_result(result)
```
## Work with command results
DuckDB returns affected counts as a `Count` result column for DML statements. QuackDB normalizes those into `num_rows`:
```elixir
alias QuackDB.{DDL, DML}
{:ok, _} = QuackDB.query(conn, DDL.create_table("events", [id: :integer], temporary: true))
{:ok, result} = QuackDB.query(conn, DML.insert_into("events", [[id: 1], [id: 2]]))
result.command
#=> :insert
result.num_rows
#=> 2
result.columns
#=> nil
result.rows
#=> nil
```
The raw DuckDB count result stays available for debugging:
```elixir
result.metadata[:duckdb_columns]
#=> ["Count"]
result.metadata[:duckdb_rows]
#=> [[2]]
```
## Append rows
`QuackDB.insert_rows/4` uses Quack's append protocol to send a DuckDB `DataChunk` directly to a table:
```elixir
alias QuackDB.DDL
QuackDB.query!(conn, DDL.create_table("events", [id: :integer, name: :varchar, active: :boolean], temporary: true))
{:ok, result} =
QuackDB.insert_rows(conn, "events", [
[id: 1, name: "duck", active: true],
[id: 2, name: "goose", active: false]
])
result.command
#=> :insert
result.num_rows
#=> 2
```
Keyword rows preserve append order and allow QuackDB to infer the column list from the first row. Map rows are also accepted, but pass `:columns` for stable append order and types. Explicit columns are still required for empty batches or all-null columns. Use `batch_size: n` to split large inputs across multiple append requests while returning the total inserted row count.
Native append columns can be declared with scalar `QuackDB.Type` specs and nested specs such as `{:list, :varchar}`, `{:struct, [source: :varchar, count: :integer]}`, `{:array, :integer, 3}`, and `{:map, :varchar, :varchar}`. Explicit MAP columns accept ordinary Elixir maps:
```elixir
QuackDB.insert_rows!(conn, "events", [[id: 1, labels: %{env: "prod", region: "eu"}]],
columns: [id: :integer, labels: {:map, :varchar, :varchar}]
)
```
Plain Elixir maps without an explicit MAP column spec infer as DuckDB `STRUCT` values. Temporal append values are normalized through Elixir's Calendar-aware `Date`, `Time`, `NaiveDateTime`, and `DateTime` conversion APIs before encoding.
## Full-text search helpers
DuckDB's `fts` extension can index text columns and expose BM25 ranking.
```elixir
alias QuackDB.FTS
MyApp.AnalyticsRepo.query!(FTS.install())
MyApp.AnalyticsRepo.query!(FTS.load())
MyApp.AnalyticsRepo.query!(
FTS.create_index("documents", :id, [:title, :body], overwrite: true)
)
from doc in "documents",
where: match_bm25("fts_main_documents", doc.id, ^"duckdb analytics") > 0,
order_by: [desc: match_bm25("fts_main_documents", doc.id, ^"duckdb analytics")],
select: %{id: doc.id, title: doc.title}
```
See the [full-text search guide](full-text-search.md) for direct SQL and Ecto usage.
## Spatial helpers
DuckDB's spatial extension can be loaded with SQL helpers. Use Ecto spatial helpers when you want to keep spatial expressions inside Ecto queries:
```elixir
use QuackDB.Ecto
alias QuackDB.Spatial
QuackDB.query!(conn, Spatial.load())
point = %Geo.Point{coordinates: {1.0, 2.0}, srid: nil}
from(place in "places",
where: intersects(place.geom, ^point),
select: as_text(place.geom)
)
```
DuckDB `GEOMETRY` values decode as WKB-compatible bytes. Add optional `{:geo, "~> 4.1"}` when you want to convert those bytes to `Geo` structs with `QuackDB.Geometry.decode_wkb!/1`, or pass `%Geo.*{}` structs as Ecto parameters.
## Inspect output in IEx
QuackDB implements compact inspection for common structs so manual review stays readable:
```elixir
QuackDB.query!(conn, "SELECT i FROM range(0, 4) t(i)")
#QuackDB.Result<command: :select, columns: ["i"], rows: 4, preview: [[0], [1], [2], :...], connection_id: "...", needs_more_fetch?: false>
```
The actual rows are still available through `result.rows`.
## Transactions
QuackDB implements `DBConnection` transaction callbacks with SQL statements:
```elixir
alias QuackDB.{DDL, DML}
DBConnection.transaction(conn, fn tx ->
QuackDB.query!(tx, DDL.create_table("tx_events", [id: :integer], temporary: true))
QuackDB.query!(tx, DML.insert_into("tx_events", id: 1))
end)
```
## Ecto raw SQL
QuackDB includes an initial Ecto SQL adapter for raw SQL queries. The Ecto adapter is compiled when `ecto_sql` is available, so add Ecto SQL if your app does not already depend on it:
```elixir
def deps do
[
{:quackdb, "~> 0.3.0"},
{:ecto_sql, "~> 3.13"}
]
end
```
Then define a repo:
```elixir
defmodule MyApp.AnalyticsRepo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.QuackDB
end
```
Configure the repo with the same options accepted by `QuackDB.start_link/1`:
```elixir
config :my_app, MyApp.AnalyticsRepo,
uri: "http://[::1]:9494",
token: "super_secret"
```
Generated DDL and setup-oriented DML can participate in Ecto transactions:
```elixir
{:ok, :committed} =
MyApp.AnalyticsRepo.transaction(fn ->
MyApp.AnalyticsRepo.query!(
QuackDB.DDL.create_table("events", [id: :integer], temporary: true)
)
MyApp.AnalyticsRepo.query!(QuackDB.DML.insert_into("events", [[id: 1], [id: 2]]))
:committed
end)
```
Use `Repo.rollback/1` to abort transaction work:
```elixir
{:error, :rolled_back} =
MyApp.AnalyticsRepo.transaction(fn ->
MyApp.AnalyticsRepo.query!(QuackDB.DML.insert_into("events", id: 3))
MyApp.AnalyticsRepo.rollback(:rolled_back)
end)
```
Read-only Ecto queries against table names are also supported, including CTEs, window functions, joins, grouping, having, distinct, aggregate `FILTER`, arithmetic expressions, `in/2`, predicates, ordering, limits, aggregates, fragments, and DuckDB analytical helpers:
```elixir
use QuackDB.Ecto
MyApp.AnalyticsRepo.all(
from event in "events",
where: event.id > ^min_id and like(event.name, "d%"),
group_by: event.category,
select: %{
category: event.category,
median_score: median(event.score),
p95_score: quantile_cont(event.score, 0.95),
scores: list(event.score),
high_score_events: filter(count(event.id), event.score > 100)
}
)
```
Build date spines and timestamp buckets with Elixir-native `Date.Range` and `Duration` values:
```elixir
use QuackDB.Ecto
MyApp.AnalyticsRepo.all(
from day in series(Date.range(~D[2024-01-01], ~D[2024-01-31])),
left_join: event in "events",
on: event.occurred_on == day.value,
group_by: day.value,
order_by: day.value,
select: %{day: day.value, events: count(event.id)}
)
interval = Duration.new!(hour: 1)
origin = ~N[2024-01-01 00:00:00]
MyApp.AnalyticsRepo.all(
from event in "events",
group_by: selected_as(:bucket),
select: %{
bucket: selected_as(time_bucket(^interval, event.occurred_at, origin: ^origin), :bucket),
events: count()
}
)
```
Text and regex helpers keep DuckDB string predicates in the query DSL. DuckDB regexes use RE2; `~r` literals are convenient for the syntax shared with Elixir regexes, and `contains/2` dispatches obvious text calls to DuckDB `contains` while spatial helper calls go to `ST_Contains`. Ambiguous calls raise; use `contains_text/2` or `st_contains/2` when intent is not obvious:
```elixir
use QuackDB.Ecto
MyApp.AnalyticsRepo.all(
from event in "events",
where: contains(event.name, "duck") and regexp_matches(event.name, ~r/^duck/i),
select: %{
slug: regexp_replace(event.name, ~r/\s+/, "-", "g"),
tags: string_split(event.tags, ",")
}
)
```
JSON columns can use Ecto access syntax for string extraction, `type/2` for numeric/boolean casts, or explicit DuckDB JSON helpers:
```elixir
use QuackDB.Ecto
MyApp.AnalyticsRepo.all(
from event in "events",
where: event.payload["user"]["name"] == "duck" and type(event.payload["score"], :integer) > 10,
select: %{
name: event.payload["user"]["name"],
active: type(event.payload["active"], :boolean),
score: json_extract(event.payload, [:scores, 0]),
has_name: json_exists(event.payload, [:user, :name])
}
)
```
DuckDB's `QUALIFY` clause is not part of Ecto's query AST. Use an Ecto subquery when filtering window function results:
```elixir
ranked =
from event in "events",
windows: [by_category: [partition_by: event.category, order_by: [desc: event.score]]],
select: %{
id: event.id,
category: event.category,
score: event.score,
rank: over(row_number(), :by_category)
}
MyApp.AnalyticsRepo.all(
from event in subquery(ranked),
where: event.rank <= 3,
order_by: [event.category, event.rank]
)
```
Use `case_when` for multi-branch DuckDB `CASE WHEN` expressions inside queries:
```elixir
use QuackDB.Ecto
MyApp.AnalyticsRepo.all(
from event in "events",
group_by: [date_part(:hour, event.occurred_at), selected_as(:tier)],
select: %{
hour: date_part(:hour, event.occurred_at),
tier:
selected_as(
case_when do
event.score >= 90 -> "high"
event.score >= 50 and event.score <= 89 -> "medium"
true -> "normal"
end,
:tier
),
events: count(),
distinct_events: count(event.id, :distinct),
high_events: filter(count(event.id), event.score >= 90),
average_score: coalesce(avg(event.score), 0),
precise_score_sum: fsum(event.score),
mode_score: mode(event.score),
weighted_score: weighted_avg(event.score, event.weight),
score_stddev: stddev(event.score),
score_variance: variance(event.score),
score_entropy: entropy(event.score),
score_histogram: histogram(event.score),
ordered_scores: list(event.score, order_by: [desc_nulls_last: event.score])
}
)
```
Profile an Ecto queryable with DuckDB `SUMMARIZE` when you need a quick statistical overview:
```elixir
query =
from event in "events",
where: event.score > 0,
select: %{category: event.category, score: event.score}
QuackDB.Ecto.Analytics.summarize!(MyApp.AnalyticsRepo, query)
```
For table/source names or raw SQL outside Ecto, use the direct SQL helper:
```elixir
MyApp.AnalyticsRepo.query!(QuackDB.Analytics.summarize("events"))
MyApp.AnalyticsRepo.query!(QuackDB.Analytics.summarize({:query, "SELECT score FROM events"}))
```
Ecto `insert/2` and `insert_all/3` are supported for straightforward row inserts. DuckDB `RETURNING` works through the SQL insert path:
```elixir
MyApp.AnalyticsRepo.insert!(%Event{id: 1, name: "duck"})
MyApp.AnalyticsRepo.insert_all(
"events",
[[id: 2, name: "goose"]],
returning: [:id]
)
```
Use `insert_method: :append` to opt into Quack's native append protocol for plain `insert_all` workloads. This fast path does not support query inserts, streams, `:returning`, placeholders, or upserts; use `QuackDB.insert_stream!/4` for streaming rows.
```elixir
MyApp.AnalyticsRepo.insert_all(
"events",
[[id: 1, name: "duck"], [id: 2, name: "goose"]],
insert_method: :append,
chunk_every: 10_000
)
```
For temporary analytical setup, `QuackDB.DDL.create_table/3` builds quoted DuckDB `CREATE TABLE` and `CREATE TABLE AS` statements:
```elixir
MyApp.AnalyticsRepo.query!(
QuackDB.DDL.create_table("events",
[payload: :json, occurred_at: :timestamp],
temporary: true
)
)
source = QuackDB.Source.parquet("s3://bucket/events/*.parquet")
query =
from event in source,
select: %{id: event.id, name: event.name}
MyApp.AnalyticsRepo.query!(
QuackDB.DDL.create_table("events_from_parquet", as: query, temporary: true)
)
```
`DDL.create_table/2` rejects parameterized Ecto queries in `:as` because DDL helpers return SQL iodata, not `{sql, params}` tuples.
Ecto support covers analytical reads and common write/setup flows. `Repo.query/3`, schema-backed reads, combinations, inserts/upserts, schema update/delete callbacks, `update_all` / `delete_all` mutations, `EXPLAIN`, transactions, and basic migrator-backed DDL work; advanced migration features and DuckDB-specific SQL should still use `Repo.query/3`.
### Basic Ecto migrations
QuackDB implements the Ecto migration DDL callbacks needed for common analytical setup migrations:
```elixir
defmodule MyApp.Repo.Migrations.CreateEvents do
use Ecto.Migration
def change do
create table(:events, primary_key: false) do
add(:id, :integer, primary_key: true)
add(:name, :string, null: false)
add(:score, :integer, default: 0)
end
create(index(:events, [:name]))
create(constraint(:events, :positive_score, check: "score >= 0"))
end
end
```
Supported DDL includes create/drop/alter table, add/modify/drop columns, references, ordinary and unique indexes, primary keys, composite primary keys, check constraints, and table/column renames. DuckDB-incompatible options such as concurrent indexes, covering indexes, exclude constraints, constraint comments, and `NOT VALID` constraints raise explicit QuackDB errors.
## Current limitations
- Server-side bind parameters are not exposed by this Quack client path yet. QuackDB formats supported parameter values as DuckDB SQL literals client-side.
- Native appends support row and column batches but not Arrow IPC or automatic local-file/data staging yet.
- Ecto coverage focuses on analytical reads and common write/setup workflows, not every relational adapter feature.
- Quack is experimental and may change with DuckDB releases.
## Supervision and connection options
Use QuackDB under your application supervisor when you want a long-lived connection pool:
```elixir
children = [
{QuackDB,
uri: "http://[::1]:9494",
token: "super_secret",
name: MyApp.QuackDB,
pool_size: 5}
]
```
The client accepts QuackDB options such as `:uri`, `:token`, and `:transport`, plus DBConnection pool options such as `:name`, `:pool_size`, `:queue_target`, `:queue_interval`, and per-call `:timeout`.
```elixir
QuackDB.query(MyApp.QuackDB, "SELECT 1", [], timeout: 10_000)
```
For local development, tests, or notebooks, QuackDB can also supervise a local DuckDB Quack server process with MuonTrap:
```elixir
children =
QuackDB.Server.child_specs(
server: [
name: MyApp.DuckDB,
duckdb: :managed,
endpoint: "quack:localhost:9494",
settings: [threads: System.schedulers_online()],
global_settings: [quack_fetch_batch_chunks: 4]
],
client: [name: MyApp.QuackDB, pool_size: System.schedulers_online()]
)
```
`QuackDB.Server` starts the external `duckdb` executable and serves the Quack protocol. It is a convenience process supervisor, not an embedded DuckDB driver and not required for remote DuckDB servers. Use `duckdb: :managed` for local development convenience, or omit it to use `duckdb` from `PATH`.
The local server defaults to `settings: [threads: System.schedulers_online()]` and `global_settings: [quack_fetch_batch_chunks: 4]`. Use a smaller client `pool_size` such as `1..4` for heavy analytical scans, because DuckDB parallelizes each query internally. Use `System.schedulers_online()` for many small concurrent queries.
## Running QuackDB's integration tests
With a server running locally:
```sh
QUACKDB_TEST_URI='http://[::1]:9494' \
QUACKDB_TEST_TOKEN=super_secret \
mix test --include integration
```