guides/getting_started.md

# Getting Started

This guide is an introduction to Blink, a fast bulk data insertion library for Ecto and PostgreSQL.

In this guide, we will:

- Create a seeder module for inserting users and posts
- Learn how to reference data from previously declared tables
- Use streams for memory-efficient seeding
- Store auxiliary data in context without inserting it into the database

## Adding Blink to an application

Add Blink to your dependencies in `mix.exs`:

```elixir
defp deps do
  [
    {:blink, "~> 0.6.1"}
  ]
end
```

Install the dependencies:

```bash
mix deps.get
```

## Configuring the repository

Blink works with any Ecto repository. If you don't have Ecto set up yet, follow the [Ecto Getting Started guide](https://hexdocs.pm/ecto/getting-started.html) to configure your repository and create your database tables.

For this guide, we'll assume you have:

- An Ecto repository (e.g., `Blog.Repo`) configured
- A `users` table with columns: `id`, `name`, `email`, `inserted_at`, `updated_at`
- A `posts` table with columns: `id`, `title`, `body`, `user_id`, `inserted_at`, `updated_at`

## Creating a seeder

Now that we have our database set up, let's create a seeder module to insert data:

```elixir
defmodule Blog.Seeder do
  use Blink

  def call do
    new()
    |> with_table("users")
    |> with_table("posts")
    |> run(Blog.Repo)
  end

  def table(_seeder, "users") do
    [
      %{id: 1, name: "Alice", email: "alice@example.com"},
      %{id: 2, name: "Bob", email: "bob@example.com"}
    ]
  end

  def table(seeder, "posts") do
    IO.inspect(seeder)
    # %Blink.Seeder{
    #   tables: %{"users" => [%{id: 1, name: "Alice", ...}, ...]},
    #   table_order: ["users"],
    #   table_opts: %{"users" => []},
    #   context: %{}
    # }

    users = seeder.tables["users"]

    Enum.flat_map(users, fn user ->
      for i <- 1..5 do
        %{
          id: (user.id - 1) * 5 + i,
          title: "Post #{i} by #{user.name}",
          body: "This is the content of post #{i}.",
          user_id: user.id,
          inserted_at: ~U[2024-01-01 00:00:00Z],
          updated_at: ~U[2024-01-01 00:00:00Z]
        }
      end
    end)
  end
end
```

The seeder above does the following:

1. `use Blink` - Injects Blink's functions and defines required callbacks
2. `new()` - Creates an empty Seeder struct
3. `with_table/2` - Declares the tables to insert rows into
4. `table/2` - Defines what rows to insert into each table
5. `run/2` - Executes the bulk insertion

Each `table/2` callback receives a Seeder struct. The `tables` field stores data from previously declared tables, allowing the `"posts"` callback to reference `seeder.tables["users"]`.

Once `run/2` is called, data is inserted in the order tables were declared. The `context` field is covered below.

Let's run it from IEx:

```elixir
iex -S mix
iex> Blog.Seeder.call()
# => Inserts 2 users and 10 posts
```

## Streams

In the example above, the `table/2` clauses returned lists. Since Blink stores the entire Seeder struct in memory, large lists can be problematic.

To avoid this, `table/2` can return a stream instead:

```elixir
def table(_seeder, "users") do
  Stream.map(1..1_000_000, fn i ->
    %{
      id: i,
      name: "User #{i}",
      email: "user#{i}@example.com",
      inserted_at: ~U[2024-01-01 00:00:00Z],
      updated_at: ~U[2024-01-01 00:00:00Z]
    }
  end)
end

def table(seeder, "posts") do
  Stream.flat_map(seeder.tables["users"], fn user ->
    for i <- 1..20 do
      %{
        id: (user.id - 1) * 20 + i,
        title: "Post #{i} by #{user.name}",
        body: "This is the content of post #{i}",
        user_id: user.id,
        inserted_at: ~U[2024-01-01 00:00:00Z],
        updated_at: ~U[2024-01-01 00:00:00Z]
      }
    end
  end)
end
```

Streams are processed lazily by `run/2` without extra configuration needed.

## Using context

Sometimes you need to compute data once and share it across multiple tables. Context data is not inserted into the database but is available when building your table data.

In this example, we generate timestamps once and reuse them across tables, ensuring posts are created after their author are.

```elixir
def call do
  new()
  |> with_context("timestamps")
  |> with_table("users")
  |> with_table("posts")
  |> run(Blog.Repo)
end

def context(_seeder, "timestamps") do
  base = ~U[2024-01-01 00:00:00Z]
  for day <- 0..29, do: DateTime.add(base, day, :day)
end

def table(seeder, "users") do
  timestamps = seeder.context["timestamps"]
  random_timestamp = Enum.random(timestamps)

  for i <- 1..100 do
    %{
      id: i,
      name: "User #{i}",
      email: "user#{i}@example.com",
      inserted_at: random_timestamp,
      updated_at: random_timestamp
    }
  end
end

def table(seeder, "posts") do
  users = seeder.tables["users"]
  timestamps = seeder.context["timestamps"]

  Enum.flat_map(users, fn user ->
    # Only use timestamps after the user was created
    valid_timestamps =
      Enum.filter(timestamps, fn ts ->
        DateTime.compare(ts, user.inserted_at) == :gt
      end)

    random_valid_timestamp = Enum.random(valid_timestamps)

    for i <- 1..5 do
      %{
        id: (user.id - 1) * 5 + i,
        title: "Post #{i}",
        body: "Content here",
        user_id: user.id,
        inserted_at: random_valid_timestamp,
        updated_at: random_valid_timestamp
      }
    end
  end)
end
```

## Summary

In this guide, we learned how to:

- Create a seeder module with `use Blink`
- Reference data from previously declared tables via `seeder.tables`
- Use streams for memory-efficient seeding of large datasets
- Store auxiliary data in context without inserting it into the database

## Next steps

You might also find these guides useful:

- [Configuring Options](configuring_options.html) - Set global and per-table options for batch size and concurrency
- [Loading Data from Files](loading_data_from_files.html) - Learn how to load data from CSV and JSON files
- [Integrating with ExMachina](integrating_with_ex_machina.html) - Generate realistic test data