README.md

# EctoBigquery

Ecto adapter for Google BigQuery focused on **schema management and data seeding**: creating and evolving BigQuery datasets and tables through Ecto migrations, and loading initial data via the Seeds helper.

## Why this library exists

I work as a data engineer and part of my day-to-day is maintaining BigQuery schemas across multiple environments. For a long time our team used Terraform for that, and while it works well for provisioning infrastructure, it falls short when it comes to schema evolution — `ALTER TABLE` operations like adding or dropping columns aren't natively supported, which forced us to write workarounds or apply changes manually.

Having spent years working with tools like Active Record and Ecto Migrations, I believe the migration pattern is simply the best way to manage a database schema: changes are versioned, ordered, reproducible, and reversible. That mental model transfers directly to BigQuery, so I built this library to bring it there.

> **Scope:** This library is intentionally limited to DDL operations (`mix ecto.create/drop/migrate/rollback`) and seeding (INSERT/DELETE via `EctoBigquery.Seeds`). It does **not** implement a general-purpose query interface — `Repo.all`, `Repo.insert`, `Repo.update`, and `Repo.delete` are not supported for application tables. This is a deliberate design decision: the goal of this library is schema and seed management, not runtime data access.

## Features

- `mix ecto.create` — creates the BigQuery dataset
- `mix ecto.drop` — drops the BigQuery dataset
- `mix ecto.migrate` — runs DDL migrations (create/drop tables, add/drop/rename columns)
- `mix ecto.rollback` — reverts migrations
- Seeds helper — INSERT and DELETE via BigQuery DML

## Installation

Add `ecto_bigquery` to your dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:ecto_bigquery, "~> 0.1"}
  ]
end
```

## Setup

### 1. Define your Repo

```elixir
defmodule MyApp.Repo do
  use EctoBigquery.Repo, otp_app: :my_app
end
```

### 2. Configure your application

In `config/config.exs`:

```elixir
config :my_app, MyApp.Repo,
  project_id: "my-gcp-project",
  dataset_id: "my_dataset",
  goth: MyApp.Goth
```

### 3. Add to your supervision tree

```elixir
# lib/my_app/application.ex
children = [
  {Goth, name: MyApp.Goth, source: {:default, []}},
  MyApp.Repo
]
```

### 4. Authentication

EctoBigquery uses [Goth](https://github.com/petalframework/goth) for Google Cloud authentication.

**Application Default Credentials (recommended for local development):**

```bash
gcloud auth application-default login
```

**Service Account (recommended for production):**

```elixir
{Goth,
  name: MyApp.Goth,
  source: {:service_account, Jason.decode!(File.read!("credentials.json"))}}
```

## Migrations

Migrations use standard Ecto syntax. Supported operations:

### Create table

```elixir
defmodule MyApp.Repo.Migrations.CreateEvents do
  use Ecto.Migration

  def change do
    create table(:events) do
      add :user_id,    :string,       null: false
      add :event_type, :string
      add :amount,     :decimal
      add :active,     :boolean
      add :metadata,   :map
      timestamps()
    end
  end
end
```

### Alter table

```elixir
defmodule MyApp.Repo.Migrations.AlterEvents do
  use Ecto.Migration

  def change do
    alter table(:events) do
      add    :contact_email, :string
      remove :legacy_field
    end

    rename table(:events), :amount, to: :total_amount
  end
end
```

### Drop table

```elixir
defmodule MyApp.Repo.Migrations.DropEvents do
  use Ecto.Migration

  def change do
    drop table(:events)
  end
end
```

## Type Mapping

### Standard types

| Ecto type           | BigQuery type |
|---------------------|---------------|
| `:string`           | `STRING`      |
| `:integer`          | `INTEGER`     |
| `:float`            | `FLOAT`       |
| `:boolean`          | `BOOLEAN`     |
| `:decimal`          | `NUMERIC`     |
| `:date`             | `DATE`        |
| `:time`             | `TIME`        |
| `:naive_datetime`   | `DATETIME`    |
| `:utc_datetime`     | `TIMESTAMP`   |
| `:binary`           | `BYTES`       |
| `:map`              | `JSON`        |

### Column modes

BigQuery columns support three modes as an alternative to `null: false`:

| Option             | BigQuery mode | SQL generated      |
|--------------------|---------------|--------------------|
| `mode: :nullable`  | `NULLABLE`    | _(no constraint)_  |
| `mode: :required`  | `REQUIRED`    | `NOT NULL`         |
| `mode: :repeated`  | `REPEATED`    | `ARRAY<T>`         |

```elixir
create table(:analytics) do
  add :user_id,    :string,  mode: :required   # STRING NOT NULL
  add :tags,       :string,  mode: :repeated   # ARRAY<STRING>
  add :notes,      :string,  mode: :nullable   # STRING (explicit)
end
```

`mode: :repeated` is a shorthand for `{:array, type}`. Both are equivalent:

```elixir
add :tags, :string,          mode: :repeated   # ARRAY<STRING>
add :tags, {:array, :string}                   # ARRAY<STRING>
```

The generator supports `:repeated` and `:nullable` as modifiers:

```bash
mix ecto_bigquery.gen.migration create_analytics user_id:string:required tags:string:repeated notes:string:nullable
```

### Special BigQuery types

| Ecto type                          | BigQuery type              |
|------------------------------------|----------------------------|
| `:geography`                       | `GEOGRAPHY`                |
| `:bignumeric`                      | `BIGNUMERIC`               |
| `{:array, :string}`                | `ARRAY<STRING>`            |
| `{:array, :integer}`               | `ARRAY<INTEGER>`           |
| `{:struct, [field: :type, ...]}`   | `STRUCT<field TYPE, ...>`  |
| `{:array, {:struct, [...]}}`       | `ARRAY<STRUCT<...>>`       |

```elixir
create table(:locations) do
  add :point,        :geography
  add :exact_amount, :bignumeric
  add :tags,         {:array, :string}
  add :scores,       {:array, :integer}
  add :address,      {:struct, [street: :string, city: :string, zip: :string]}
  add :contacts,     {:array, {:struct, [name: :string, email: :string]}}
end
```

```sql
CREATE TABLE `locations` (
  point GEOGRAPHY,
  exact_amount BIGNUMERIC,
  tags ARRAY<STRING>,
  scores ARRAY<INTEGER>,
  address STRUCT<street STRING, city STRING, zip STRING>,
  contacts ARRAY<STRUCT<name STRING, email STRING>>
)
```

## Seeds

Use `EctoBigquery.Seeds` in `priv/repo/seeds.exs`:

```elixir
# Option A: via Repo (Goth must be running in the supervision tree)
EctoBigquery.Seeds.insert(MyApp.Repo, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

# Atomic multi-table insert — rolls back all if any insert fails
EctoBigquery.Seeds.insert_all(MyApp.Repo, [
  {"users",    [%{id: "1", name: "Alice"}]},
  {"products", [%{sku: "P1", price: 9.99}]}
])

EctoBigquery.Seeds.delete(MyApp.Repo, "users", "active = FALSE")
EctoBigquery.Seeds.truncate(MyApp.Repo, "users")

# Option B: explicit connection (useful in standalone scripts)
config     = MyApp.Repo.config()
project_id = Keyword.get(config, :project_id)
dataset_id = Keyword.get(config, :dataset_id)
{token, 0} = System.cmd("gcloud", ["auth", "print-access-token"])
conn = EctoBigquery.Connection.new(String.trim(token))

EctoBigquery.Seeds.insert_with_conn(conn, project_id, dataset_id, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

EctoBigquery.Seeds.delete_with_conn(conn, project_id, dataset_id, "users", "active = FALSE")
EctoBigquery.Seeds.truncate_with_conn(conn, project_id, dataset_id, "users")
```

## Migration generator

`mix ecto_bigquery.gen.migration` generates migration files with pre-filled column definitions based on the migration name pattern.

```bash
# Create table
mix ecto_bigquery.gen.migration create_users name:string age:integer active:boolean

# NOT NULL columns
mix ecto_bigquery.gen.migration create_products sku:string:required price:decimal:required

# Add column
mix ecto_bigquery.gen.migration add_email_to_users email:string:required

# Remove column
mix ecto_bigquery.gen.migration remove_phone_from_users phone

# Drop table
mix ecto_bigquery.gen.migration drop_legacy_data

# Special types
mix ecto_bigquery.gen.migration create_places location:geography amount:bignumeric "tags:array[string]"
```

### Generator type aliases

| Alias                    | Ecto type       |
|--------------------------|-----------------|
| `string`, `text`         | `:string`       |
| `integer`, `int`         | `:integer`      |
| `float`                  | `:float`        |
| `decimal`, `numeric`     | `:decimal`      |
| `boolean`, `bool`        | `:boolean`      |
| `date`                   | `:date`         |
| `datetime`, `utc_datetime` | `:utc_datetime` |
| `naive_datetime`         | `:naive_datetime`|
| `binary`                 | `:binary`       |
| `map`, `json`            | `:map`          |
| `geography`              | `:geography`    |
| `bignumeric`             | `:bignumeric`   |
| `array[TYPE]`            | `{:array, TYPE}`|

> Note: `STRUCT` types must be written manually in the migration file since their nested fields cannot be expressed in a single CLI argument.

## Mix tasks

```bash
# Create the BigQuery dataset
mix ecto.create

# Drop the dataset (deletes all tables)
mix ecto.drop

# Run pending migrations
mix ecto.migrate

# Revert the last migration
mix ecto.rollback

# Run seeds
mix run priv/repo/seeds.exs
```

## Out of scope

The following are intentionally not supported:

- **Generic query interface** — `Repo.all`, `Repo.insert`, `Repo.update`, `Repo.delete` for application tables. Use `google_api_big_query` directly for runtime data access.
- **DDL transactions** — BigQuery does not support transactional DDL. If a migration with multiple changes fails midway, previously applied changes remain permanent.