guides/loading_data_from_files.md

# Loading Data from Files

Blink provides helper functions to load data from CSV and JSON files, making it easy to seed your database from external data sources.

## Loading from CSV files

CSV files are a common format for storing tabular data. Blink can read CSV files and convert them into maps suitable for insertion.

### Basic usage

Create a CSV file at `priv/seed_data/users.csv`:

```csv
id,name,email
1,Alice Johnson,alice@example.com
2,Bob Smith,bob@example.com
3,Carol White,carol@example.com
```

Load it in your seeder:

```elixir
defmodule Blog.Seeder do
  use Blink

  def call do
    new()
    |> with_table("users")
    |> run(Blog.Repo)
  end

  def table(_seeder, "users") do
    Blink.from_csv("priv/seed_data/users.csv")
  end
end
```

By default, from_csv/2 reads the first row as column headers and returns all keys and values as strings.

### Transforming data

Use the `:transform` option to convert types and add required fields:

```elixir
def table(_seeder, "users") do
  Blink.from_csv("priv/seed_data/users.csv",
    transform: fn row ->
      row
      |> Map.update!("id", &String.to_integer/1)
      |> Map.put("inserted_at", ~U[2024-01-01 00:00:00Z])
      |> Map.put("updated_at", ~U[2024-01-01 00:00:00Z])
    end
  )
end
```

The `transform` function receives each row as a map and should return the transformed map.

### CSV files without headers

If your CSV file doesn't have a header row, provide the column names explicitly:

```elixir
def table(_seeder, "users") do
  Blink.from_csv("priv/seed_data/users_no_headers.csv",
    headers: ["id", "name", "email"]
  )
end
```

### Combining headers and transform

You can use both options together:

```elixir
def table(_seeder, "users") do
  Blink.from_csv("priv/seed_data/users_no_headers.csv",
    headers: ["id", "name", "email"],
    transform: fn row ->
      Map.update!(row, "id", &String.to_integer/1)
    end
  )
end
```

### Streaming CSV files

For large CSV files, use the `:stream` option to process data lazily without loading the entire file into memory:

```elixir
def table(_seeder, "users") do
  Blink.from_csv("priv/seed_data/large_users.csv",
    stream: true,
    transform: fn row ->
      Map.update!(row, "id", &String.to_integer/1)
    end
  )
end
```

When `stream: true` is set, `from_csv/2` returns a stream instead of a list. Blink's insertion process handles streams efficiently.

### JSONB columns in CSV files

CSV files can also contain JSON data for JSONB columns, including deeply nested structures. Embed the JSON as a quoted string:

```csv
id,name,email,settings
1,Alice,alice@example.com,"{""theme"":""dark"",""notifications"":{""email"":true,""sms"":false}}"
2,Bob,bob@example.com,"{""theme"":""light"",""notifications"":{""email"":false,""sms"":true}}"
```

Use the `:transform` option to decode the JSON string:

```elixir
def table(_seeder, "users") do
  Blink.from_csv("priv/seed_data/users_with_settings.csv",
    transform: fn row ->
      %{
        id: String.to_integer(row["id"]),
        name: row["name"],
        email: row["email"],
        settings: Jason.decode!(row["settings"])
      }
    end
  )
end
```

## Loading from JSON files

JSON files are also supported via `from_json/2`.

### Basic usage

Create a JSON file at `priv/seed_data/products.json`:

```json
[
  { "id": 1, "name": "Widget", "price": 9.99 },
  { "id": 2, "name": "Gadget", "price": 19.99 },
  { "id": 3, "name": "Doohickey", "price": 29.99 }
]
```

Load it in your seeder:

```elixir
def table(_seeder, "products") do
  Blink.from_json("priv/seed_data/products.json")
end
```

The JSON file must contain an array of objects at the root level. Each object becomes a map with string keys.

Note that `from_json/2` does not support the `:stream` option. For large datasets, consider using CSV files with `stream: true` instead.

### Transforming JSON data

Use the `:transform` option to add or modify fields:

```elixir
def table(_seeder, "products") do
  Blink.from_json("priv/seed_data/products.json",
    transform: fn product ->
      Map.merge(product, %{
        "inserted_at" => ~U[2024-01-01 00:00:00Z],
        "updated_at" => ~U[2024-01-01 00:00:00Z]
      })
    end
  )
end
```

### Seeding JSONB columns

Blink automatically handles nested maps when inserting into JSONB columns. Create a JSON file with nested objects:

```json
[
  {
    "id": 1,
    "name": "Alice",
    "settings": { "theme": "dark", "notifications": true }
  },
  {
    "id": 2,
    "name": "Bob",
    "settings": { "theme": "light", "notifications": false }
  }
]
```

Load it in your seeder:

```elixir
def table(_seeder, "users") do
  Blink.from_json("priv/seed_data/users_with_settings.json",
    transform: fn row ->
      %{
        id: row["id"],
        name: row["name"],
        email: "#{String.downcase(row["name"])}@example.com",
        settings: row["settings"]
      }
    end
  )
end
```

The nested `settings` map is automatically JSON-encoded and inserted into the JSONB column.

## Error handling

The functions `from_csv/2` and `from_json/2` will raise exceptions if:

- The file doesn't exist
- The file format is invalid
- The `:transform` function is not a single-arity function
- For JSON: the root element is not an array, or the array contains non-object elements
- For CSV: the `:headers` option is not `:infer` or a list of strings

## Summary

In this guide, we learned how to:

- Load data from CSV files with `from_csv/2`
- Transform data with the `:transform` option
- Handle CSV files without headers
- Stream large CSV files with the `:stream` option
- Load data from JSON files with `from_json/2`
- Seed JSONB columns with nested maps