README.md

# QuackLake

An Elixir library for easy [DuckLake](https://ducklake.select/) access, setup, and management.

DuckLake is DuckDB's open data lakehouse format that brings ACID transactions, time travel, and schema evolution to your data lake.

## Installation

Add `quack_lake` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:quack_lake, "~> 0.2.5"}
  ]
end
```

## Quick Start

```elixir
# Open a connection (automatically installs and loads the ducklake extension)
{:ok, conn} = QuackLake.open()

# Attach a DuckLake (creates it if it doesn't exist)
:ok = QuackLake.attach(conn, "my_lake", "my_lake.ducklake")

# Create a table and insert data
:ok = QuackLake.Query.execute(conn, "CREATE TABLE my_lake.users (id INT, name TEXT)")
:ok = QuackLake.Query.execute(conn, "INSERT INTO my_lake.users VALUES (1, 'Alice')")

# Query with ergonomic results (returns list of maps)
{:ok, rows} = QuackLake.query(conn, "SELECT * FROM my_lake.users")
# => [%{"id" => 1, "name" => "Alice"}]

# Time travel - list snapshots
{:ok, snapshots} = QuackLake.snapshots(conn, "my_lake")

# Query at a specific version
{:ok, old_rows} = QuackLake.query_at(conn, "SELECT * FROM my_lake.users", version: 1)
```

## Features

- **Ecto Adapters** - Full Ecto integration with `Ecto.Adapters.DuckDB` and `Ecto.Adapters.DuckLake`
- **Simple API** - Ergonomic Elixir interface with `{:ok, result}` / `{:error, reason}` tuples
- **Auto-setup** - Automatically installs and loads the DuckLake extension
- **Result transformation** - Query results returned as lists of maps instead of raw tuples
- **Time travel** - Query historical data at specific versions or timestamps
- **Cloud storage** - Built-in support for S3, Azure Blob Storage, and GCS credentials
- **Bulk Inserts** - High-performance Appender API for bulk data loading

## API Overview

### Connection Management

```elixir
# Open in-memory database
{:ok, conn} = QuackLake.open()

# Open persistent database
{:ok, conn} = QuackLake.open(path: "data.duckdb")

# Bang variant that raises on error
conn = QuackLake.open!()
```

### Lake Management

```elixir
# Attach a local DuckLake
:ok = QuackLake.attach(conn, "my_lake", "my_lake.ducklake")

# Attach with cloud data storage
:ok = QuackLake.attach(conn, "my_lake", "metadata.ducklake",
  data_path: "s3://my-bucket/data/")

# Detach a lake
:ok = QuackLake.detach(conn, "my_lake")

# List attached lakes
{:ok, lakes} = QuackLake.lakes(conn)
```

### Queries

```elixir
# Query returning all rows as maps
{:ok, rows} = QuackLake.query(conn, "SELECT * FROM my_lake.users")

# Query with parameters (use explicit types for arithmetic)
{:ok, rows} = QuackLake.query(conn, "SELECT * FROM my_lake.users WHERE id = $1", [1])

# Get single row (or nil)
{:ok, user} = QuackLake.query_one(conn, "SELECT * FROM my_lake.users WHERE id = $1", [1])

# Bang variants that raise on error
rows = QuackLake.query!(conn, "SELECT * FROM my_lake.users")
user = QuackLake.query_one!(conn, "SELECT * FROM my_lake.users WHERE id = $1", [1])

# Execute statements (CREATE, INSERT, UPDATE, DELETE)
:ok = QuackLake.Query.execute(conn, "INSERT INTO my_lake.users VALUES ($1, $2)", [2, "Bob"])

# Stream large result sets
QuackLake.Query.stream(conn, "SELECT * FROM my_lake.large_table")
|> Stream.each(&process_chunk/1)
|> Stream.run()
```

### Time Travel

```elixir
# List all snapshots
{:ok, snapshots} = QuackLake.snapshots(conn, "my_lake")

# Query at a specific version
{:ok, rows} = QuackLake.query_at(conn, "SELECT * FROM my_lake.users", version: 5)

# Query at a specific timestamp
{:ok, rows} = QuackLake.query_at(conn, "SELECT * FROM my_lake.users",
  timestamp: ~U[2024-01-15 10:00:00Z])

# Get changes between versions
{:ok, changes} = QuackLake.changes(conn, "my_lake", "main", "users", 1, 5)

# Expire old snapshots
:ok = QuackLake.Snapshot.expire(conn, "my_lake", before_version: 5)
```

### Cloud Storage Credentials

```elixir
# AWS S3
:ok = QuackLake.Secret.create_s3(conn, "my_s3",
  key_id: "AKIAIOSFODNN7EXAMPLE",
  secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  region: "us-east-1"
)

# Azure Blob Storage
:ok = QuackLake.Secret.create_azure(conn, "my_azure",
  account_name: "myaccount",
  account_key: "mykey..."
)

# Google Cloud Storage
:ok = QuackLake.Secret.create_gcs(conn, "my_gcs",
  key_id: "GOOG1E...",
  secret: "..."
)

# List secrets
{:ok, secrets} = QuackLake.Secret.list(conn)

# Remove a secret
:ok = QuackLake.Secret.drop(conn, "my_s3")
```

### DuckDB Extensions

DuckDB supports many extensions for additional functionality. QuackLake automatically installs and loads the `ducklake` extension, but you can install others:

```elixir
{:ok, conn} = QuackLake.open()

# Install and load an extension in one call
:ok = QuackLake.Extension.ensure(conn, "httpfs")

# Now you can use httpfs features
{:ok, rows} = QuackLake.query(conn, """
  SELECT * FROM read_parquet('https://example.com/data.parquet') LIMIT 10
""")
```

Common extensions:

| Extension | Description |
|-----------|-------------|
| `httpfs` | HTTP/S3 file system for remote files |
| `spatial` | Geospatial types and functions |
| `json` | JSON parsing and extraction |
| `iceberg` | Apache Iceberg table format |
| `delta` | Delta Lake table format |
| `postgres_scanner` | Query PostgreSQL directly |
| `sqlite_scanner` | Query SQLite databases |
| `mysql_scanner` | Query MySQL directly |
| `excel` | Read Excel files |

Example with spatial extension:

```elixir
:ok = QuackLake.Extension.ensure(conn, "spatial")

:ok = QuackLake.Query.execute(conn, """
  CREATE TABLE my_lake.locations (name TEXT, geom GEOMETRY)
""")

:ok = QuackLake.Query.execute(conn, """
  INSERT INTO my_lake.locations VALUES ('NYC', ST_Point(-74.006, 40.7128))
""")
```

## Real-World Examples

### S3-Backed DuckLake with Runtime Configuration

Store your DuckLake data in S3, with credentials loaded from your application's runtime config.

**config/runtime.exs**

```elixir
import Config

config :my_app, :quack_lake,
  s3: [
    key_id: System.get_env("AWS_ACCESS_KEY_ID"),
    secret: System.get_env("AWS_SECRET_ACCESS_KEY"),
    region: System.get_env("AWS_REGION", "us-east-1"),
    bucket: System.get_env("DUCKLAKE_S3_BUCKET")
  ],
  metadata_path: System.get_env("DUCKLAKE_METADATA_PATH", "priv/lake.ducklake")
```

**lib/my_app/lake.ex**

```elixir
defmodule MyApp.Lake do
  @moduledoc """
  DuckLake connection manager.
  """

  def open do
    config = Application.fetch_env!(:my_app, :quack_lake)
    s3_config = Keyword.fetch!(config, :s3)
    metadata_path = Keyword.fetch!(config, :metadata_path)
    bucket = Keyword.fetch!(s3_config, :bucket)

    with {:ok, conn} <- QuackLake.open(),
         :ok <- setup_s3_secret(conn, s3_config),
         :ok <- QuackLake.attach(conn, "lake", metadata_path,
                  data_path: "s3://#{bucket}/data/") do
      {:ok, conn}
    end
  end

  defp setup_s3_secret(conn, s3_config) do
    QuackLake.Secret.create_s3(conn, "s3_creds",
      key_id: Keyword.fetch!(s3_config, :key_id),
      secret: Keyword.fetch!(s3_config, :secret),
      region: Keyword.fetch!(s3_config, :region)
    )
  end
end
```

**Usage**

```elixir
{:ok, conn} = MyApp.Lake.open()

# Create tables - data stored in S3, metadata in local file
:ok = QuackLake.Query.execute(conn, """
  CREATE TABLE lake.events (
    id INTEGER,
    user_id INTEGER,
    event_type TEXT,
    payload JSON,
    created_at TIMESTAMP
  )
""")

# Insert data
:ok = QuackLake.Query.execute(conn, """
  INSERT INTO lake.events VALUES
    (1, 42, 'page_view', '{"url": "/home"}', NOW())
""")

# Query with time travel
{:ok, rows} = QuackLake.query(conn, "SELECT * FROM lake.events WHERE user_id = $1", [42])
```

### Querying PostgreSQL Directly

Use DuckDB's `postgres_scanner` to query your PostgreSQL database and optionally sync data into your DuckLake.

**config/runtime.exs**

```elixir
import Config

config :my_app, :postgres,
  host: System.get_env("POSTGRES_HOST", "localhost"),
  port: System.get_env("POSTGRES_PORT", "5432"),
  database: System.get_env("POSTGRES_DB"),
  username: System.get_env("POSTGRES_USER"),
  password: System.get_env("POSTGRES_PASSWORD")
```

**lib/my_app/analytics.ex**

```elixir
defmodule MyApp.Analytics do
  @moduledoc """
  Analytics queries combining PostgreSQL and DuckLake data.
  """

  def open do
    with {:ok, conn} <- QuackLake.open(),
         :ok <- QuackLake.Extension.ensure(conn, "postgres_scanner"),
         :ok <- attach_postgres(conn) do
      {:ok, conn}
    end
  end

  defp attach_postgres(conn) do
    pg = Application.fetch_env!(:my_app, :postgres)

    QuackLake.Query.execute(conn, """
      ATTACH 'dbname=#{pg[:database]} user=#{pg[:username]} password=#{pg[:password]} host=#{pg[:host]} port=#{pg[:port]}'
      AS pg (TYPE POSTGRES, READ_ONLY)
    """)
  end

  @doc """
  Query PostgreSQL directly through DuckDB.
  """
  def query_postgres(conn, sql, params \\ []) do
    QuackLake.query(conn, sql, params)
  end

  @doc """
  Sync a PostgreSQL table into DuckLake for fast analytics.
  """
  def sync_table(conn, pg_table, lake_table) do
    QuackLake.Query.execute(conn, """
      CREATE OR REPLACE TABLE #{lake_table} AS
      SELECT * FROM pg.public.#{pg_table}
    """)
  end
end
```

**Usage**

```elixir
{:ok, conn} = MyApp.Analytics.open()

# Query PostgreSQL directly (uses postgres_scanner)
{:ok, users} = MyApp.Analytics.query_postgres(conn, """
  SELECT * FROM pg.public.users WHERE created_at > '2024-01-01'
""")

# Sync PostgreSQL table to DuckLake for faster repeated queries
:ok = MyApp.Analytics.sync_table(conn, "orders", "lake.orders")

# Now query the local copy (much faster for analytics)
{:ok, stats} = QuackLake.query(conn, """
  SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as order_count,
    SUM(total) as revenue
  FROM lake.orders
  GROUP BY 1
  ORDER BY 1
""")

# Join PostgreSQL and DuckLake data
{:ok, report} = QuackLake.query(conn, """
  SELECT u.email, COUNT(o.id) as orders
  FROM pg.public.users u
  JOIN lake.orders o ON o.user_id = u.id
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 10
""")
```

### Combining S3 Storage with PostgreSQL

**lib/my_app/data_platform.ex**

```elixir
defmodule MyApp.DataPlatform do
  @moduledoc """
  Full data platform: S3-backed DuckLake + PostgreSQL access.
  """

  def open do
    config = Application.fetch_env!(:my_app, :quack_lake)
    pg_config = Application.fetch_env!(:my_app, :postgres)
    s3_config = Keyword.fetch!(config, :s3)

    with {:ok, conn} <- QuackLake.open(),
         :ok <- QuackLake.Extension.ensure(conn, "postgres_scanner"),
         :ok <- setup_s3(conn, s3_config),
         :ok <- setup_postgres(conn, pg_config),
         :ok <- setup_lake(conn, config, s3_config) do
      {:ok, conn}
    end
  end

  defp setup_s3(conn, s3) do
    QuackLake.Secret.create_s3(conn, "s3_creds",
      key_id: s3[:key_id],
      secret: s3[:secret],
      region: s3[:region]
    )
  end

  defp setup_postgres(conn, pg) do
    QuackLake.Query.execute(conn, """
      ATTACH 'dbname=#{pg[:database]} user=#{pg[:username]} password=#{pg[:password]} host=#{pg[:host]}'
      AS pg (TYPE POSTGRES, READ_ONLY)
    """)
  end

  defp setup_lake(conn, config, s3) do
    QuackLake.attach(conn, "lake", config[:metadata_path],
      data_path: "s3://#{s3[:bucket]}/lake/"
    )
  end
end
```

### Production DuckLake with PostgreSQL Catalog (AWS RDS)

For production deployments, you can use PostgreSQL (e.g., AWS RDS) as DuckLake's metadata catalog instead of a local file. This provides a reliable, shared catalog with managed backups and replication.

**Architecture:**

```
┌─────────────────┐     ┌─────────────────┐
│  DuckDB/Ecto    │────▶│   AWS RDS       │  (metadata catalog)
│  (your app)     │     │   PostgreSQL    │
└────────┬────────┘     └─────────────────┘
         │
         ▼
┌─────────────────┐
│    AWS S3       │  (actual data - parquet files)
│  s3://bucket/   │
└─────────────────┘
```

**Connection string format:**

```
ducklake:postgres:host=<host>;database=<db>;user=<user>;password=<pass>
```

**With the Ecto adapter:**

```elixir
# config/runtime.exs
config :my_app, MyApp.LakeRepo,
  adapter: Ecto.Adapters.DuckLake,
  database: "ducklake:postgres:host=#{System.get_env("RDS_HOST")};database=#{System.get_env("RDS_DB")};user=#{System.get_env("RDS_USER")};password=#{System.get_env("RDS_PASSWORD")}",
  pool_size: 5,
  lake_name: "lake",  # Short alias for the attached lake
  data_path: "s3://my-bucket/lake-data",
  extensions: [:httpfs, {:ducklake, source: :core}],
  secrets: [
    {:my_s3, [
      type: :s3,
      key_id: System.get_env("AWS_ACCESS_KEY_ID"),
      secret: System.get_env("AWS_SECRET_ACCESS_KEY"),
      region: "us-east-1"
    ]}
  ]
```

```elixir
# lib/my_app/lake_repo.ex
defmodule MyApp.LakeRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.DuckLake

  use Ecto.Adapters.DuckDB.RawQuery
end
```

**With raw QuackLake API:**

```elixir
{:ok, conn} = QuackLake.open()

# Setup S3 credentials
:ok = QuackLake.Secret.create_s3(conn, "s3_creds",
  key_id: System.get_env("AWS_ACCESS_KEY_ID"),
  secret: System.get_env("AWS_SECRET_ACCESS_KEY"),
  region: "us-east-1"
)

# Attach DuckLake with PostgreSQL catalog + S3 data storage
:ok = QuackLake.Query.execute(conn, """
  ATTACH 'ducklake:postgres:host=your-instance.rds.amazonaws.com;database=ducklake_meta;user=myuser;password=mypass'
  AS lake (TYPE DUCKLAKE, DATA_PATH 's3://my-bucket/lake-data/')
""")

# Now use the lake
:ok = QuackLake.Query.execute(conn, "CREATE TABLE lake.events (id INT, type TEXT, ts TIMESTAMP)")
{:ok, rows} = QuackLake.query(conn, "SELECT * FROM lake.events")
```

**Benefits of this setup:**

- **Concurrent writers** - Multiple app instances can write simultaneously
- **Managed metadata** - RDS handles backups, failover, encryption at rest
- **Scalable data** - S3 for unlimited, cost-effective storage
- **Time travel** - Snapshots stored in the PostgreSQL catalog
- **High availability** - Use RDS Multi-AZ for automatic failover

**AWS RDS requirements:**

- Create a dedicated database (e.g., `ducklake_meta`) - no special extensions needed
- Security group must allow inbound connections from your app servers
- Recommended: Use IAM authentication or Secrets Manager for credentials
- For SSL: Add `sslmode=require` to the connection string

## Supervised Connection

QuackLake uses plain functions by default (no process overhead). If you want a supervised connection that restarts on failure, wrap it in a GenServer:

**lib/my_app/lake_server.ex**

```elixir
defmodule MyApp.LakeServer do
  @moduledoc """
  Supervised DuckLake connection.
  """
  use GenServer

  # Client API

  def start_link(opts) do
    name = Keyword.get(opts, :name, __MODULE__)
    GenServer.start_link(__MODULE__, opts, name: name)
  end

  def query(server \\ __MODULE__, sql, params \\ []) do
    GenServer.call(server, {:query, sql, params})
  end

  def query!(server \\ __MODULE__, sql, params \\ []) do
    case query(server, sql, params) do
      {:ok, rows} -> rows
      {:error, reason} -> raise QuackLake.Error, message: "Query failed", reason: reason
    end
  end

  def execute(server \\ __MODULE__, sql, params \\ []) do
    GenServer.call(server, {:execute, sql, params})
  end

  def conn(server \\ __MODULE__) do
    GenServer.call(server, :conn)
  end

  # Server callbacks

  @impl true
  def init(opts) do
    case setup_connection(opts) do
      {:ok, conn} -> {:ok, %{conn: conn, opts: opts}}
      {:error, reason} -> {:stop, reason}
    end
  end

  @impl true
  def handle_call({:query, sql, params}, _from, %{conn: conn} = state) do
    {:reply, QuackLake.query(conn, sql, params), state}
  end

  def handle_call({:execute, sql, params}, _from, %{conn: conn} = state) do
    {:reply, QuackLake.Query.execute(conn, sql, params), state}
  end

  def handle_call(:conn, _from, %{conn: conn} = state) do
    {:reply, conn, state}
  end

  defp setup_connection(opts) do
    config = Keyword.get(opts, :config, Application.fetch_env!(:my_app, :quack_lake))
    s3_config = config[:s3]

    with {:ok, conn} <- QuackLake.open(),
         :ok <- maybe_setup_s3(conn, s3_config),
         :ok <- maybe_attach_lake(conn, config, s3_config) do
      {:ok, conn}
    end
  end

  defp maybe_setup_s3(_conn, nil), do: :ok
  defp maybe_setup_s3(conn, s3) do
    QuackLake.Secret.create_s3(conn, "s3_creds",
      key_id: s3[:key_id],
      secret: s3[:secret],
      region: s3[:region]
    )
  end

  defp maybe_attach_lake(_conn, %{lake_name: nil}, _s3), do: :ok
  defp maybe_attach_lake(_conn, %{metadata_path: nil}, _s3), do: :ok
  defp maybe_attach_lake(conn, config, nil) do
    QuackLake.attach(conn, config[:lake_name] || "lake", config[:metadata_path])
  end
  defp maybe_attach_lake(conn, config, s3) do
    QuackLake.attach(conn, config[:lake_name] || "lake", config[:metadata_path],
      data_path: "s3://#{s3[:bucket]}/data/"
    )
  end
end
```

**lib/my_app/application.ex**

```elixir
defmodule MyApp.Application do
  use Application

  @impl true
  def start(_type, _args) do
    children = [
      # ... other children
      {MyApp.LakeServer, name: MyApp.LakeServer}
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end
end
```

**config/runtime.exs**

```elixir
import Config

config :my_app, :quack_lake,
  lake_name: "lake",
  metadata_path: System.get_env("DUCKLAKE_METADATA_PATH", "priv/lake.ducklake"),
  s3: if System.get_env("AWS_ACCESS_KEY_ID") do
    [
      key_id: System.get_env("AWS_ACCESS_KEY_ID"),
      secret: System.get_env("AWS_SECRET_ACCESS_KEY"),
      region: System.get_env("AWS_REGION", "us-east-1"),
      bucket: System.get_env("DUCKLAKE_S3_BUCKET")
    ]
  end
```

**Usage**

```elixir
# Queries go through the supervised connection
{:ok, rows} = MyApp.LakeServer.query("SELECT * FROM lake.users")
rows = MyApp.LakeServer.query!("SELECT * FROM lake.users WHERE id = $1", [1])

# Execute statements
:ok = MyApp.LakeServer.execute("INSERT INTO lake.users VALUES ($1, $2)", [1, "Alice"])

# Get the raw connection for advanced operations
conn = MyApp.LakeServer.conn()
{:ok, snapshots} = QuackLake.snapshots(conn, "lake")
```

> **Note:** A single GenServer serializes all queries. For concurrent workloads, consider a pool (e.g., using `poolboy`) or opening connections per-request.

## Ecto Adapters

QuackLake provides two Ecto adapters for different use cases:

### Ecto.Adapters.DuckDB (Single Writer)

For local analytics with a single writer:

```elixir
# config/config.exs
config :my_app, MyApp.Repo,
  adapter: Ecto.Adapters.DuckDB,
  database: "priv/analytics.duckdb",
  extensions: [:httpfs, :parquet, {:spatial, source: :core}]
```

```elixir
# lib/my_app/repo.ex
defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.DuckDB

  # Optional: Add raw query and appender support
  use Ecto.Adapters.DuckDB.RawQuery
end
```

### Ecto.Adapters.DuckLake (Concurrent Writers)

For lakehouse deployments with concurrent writers:

```elixir
# config/config.exs
config :my_app, MyApp.LakeRepo,
  adapter: Ecto.Adapters.DuckLake,
  database: "ducklake:analytics.ducklake",
  pool_size: 5,
  lake_name: "lake",  # Custom short name (optional, overrides auto-generated)
  data_path: "s3://my-bucket/lake-data",
  extensions: [:httpfs, {:ducklake, source: :core}],
  secrets: [
    {:my_s3, [
      type: :s3,
      key_id: System.get_env("AWS_ACCESS_KEY_ID"),
      secret: System.get_env("AWS_SECRET_ACCESS_KEY"),
      region: "us-east-1"
    ]}
  ]
```

**DuckLake Adapter Options:**

| Option | Description |
|--------|-------------|
| `database` | DuckLake connection string (e.g., `ducklake:analytics.ducklake` or `ducklake:postgres:host=...`) |
| `pool_size` | Number of concurrent connections (default: 5) |
| `lake_name` | Custom lake name alias (optional, auto-generated from path if not provided) |
| `data_path` | Storage path for actual data (S3, local, etc.) |
| `extensions` | List of DuckDB extensions to load |
| `secrets` | Cloud storage credentials |

```elixir
# lib/my_app/lake_repo.ex
defmodule MyApp.LakeRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.DuckLake

  use Ecto.Adapters.DuckDB.RawQuery
end
```

### Using Ecto with DuckDB

```elixir
# Define schemas
defmodule MyApp.User do
  use Ecto.Schema

  schema "users" do
    field :name, :string
    field :email, :string
    timestamps()
  end
end

# Standard Ecto operations
MyApp.Repo.insert!(%User{name: "Alice", email: "alice@example.com"})
MyApp.Repo.all(User)
MyApp.Repo.get!(User, 1)

# Raw SQL execution (with RawQuery)
MyApp.Repo.exec!("COPY users TO 'users.parquet' (FORMAT PARQUET)")

# High-performance bulk inserts with Appender
{:ok, appender} = MyApp.Repo.appender(User)
Enum.each(users, &MyApp.Repo.append(appender, &1))
MyApp.Repo.close_appender(appender)
```

### Migrations

```elixir
defmodule MyApp.Repo.Migrations.CreateUsers do
  use Ecto.Migration

  def change do
    create table(:users) do
      add :name, :string
      add :email, :string
      add :metadata, :map
      timestamps()
    end

    create index(:users, [:email])
  end
end
```

Run migrations:

```bash
mix ecto.create
mix ecto.migrate
```

## Module Reference

| Module | Description |
|--------|-------------|
| `QuackLake` | Main facade with high-level API |
| `QuackLake.Connection` | Connection lifecycle management |
| `QuackLake.Query` | Query execution and streaming |
| `QuackLake.Lake` | DuckLake attach/detach operations |
| `QuackLake.Snapshot` | Time travel and snapshot management |
| `QuackLake.Secret` | Cloud storage credential management |
| `QuackLake.Extension` | DuckDB extension helpers |
| `QuackLake.Appender` | High-performance bulk insert API |
| `QuackLake.Config` | Configuration struct |
| `QuackLake.Error` | Error exception struct |
| `Ecto.Adapters.DuckDB` | Ecto adapter for DuckDB (single writer) |
| `Ecto.Adapters.DuckLake` | Ecto adapter for DuckLake (concurrent writers) |

## Development

### Prerequisites

- Elixir 1.15+
- Docker and Docker Compose (for integration tests)

### Setup

```bash
# Clone the repository
git clone https://github.com/nyo16/quack_lake.git
cd quack_lake

# Install dependencies
mix deps.get

# Run unit tests (no Docker required)
mix test test/unit
```

### Integration Tests

Integration tests require PostgreSQL and MinIO (S3-compatible storage) running via Docker:

```bash
# Start Docker services
docker-compose up -d

# Wait for services to be healthy
docker-compose ps

# Run all tests including integration
INTEGRATION=true mix test

# Run only integration tests
INTEGRATION=true mix test test/integration

# Run specific integration test file
INTEGRATION=true mix test test/integration/postgres_catalog_test.exs
```

### Docker Services

The `docker-compose.yml` provides:

| Service | Port | Description |
|---------|------|-------------|
| PostgreSQL | 5432 | DuckLake metadata catalog |
| MinIO | 9000 | S3-compatible object storage |
| MinIO Console | 9001 | Web UI for MinIO |

**Default Credentials:**

| Service | Username | Password |
|---------|----------|----------|
| PostgreSQL | `quacklake` | `quacklake_secret` |
| MinIO | `minioadmin` | `minioadmin123` |

### Test Structure

```
test/
├── unit/                    # Unit tests (async, no Docker)
│   ├── config_test.exs
│   └── config/
│       ├── attach_test.exs
│       ├── extension_test.exs
│       └── secret_test.exs
├── integration/             # Integration tests (require Docker)
│   ├── postgres_catalog_test.exs
│   ├── minio_s3_test.exs
│   ├── ducklake_lifecycle_test.exs
│   └── ecto/
│       ├── duckdb_adapter_test.exs
│       └── ducklake_adapter_test.exs
└── support/                 # Test helpers
    ├── data_case.ex
    ├── docker_helper.ex
    └── minio_helper.ex
```

## License

MIT