guides/image_generation.md

Select File
guides/image_generation.md

# Image generation

Image generation lives on a parallel surface to the text APIs.
`%ALLM.ImageRequest{}` and `%ALLM.ImageResponse{}` mirror the
`Request`/`Response` shape; the engine has a separate `:image_adapter`
slot; and the entry points (`ALLM.generate_image/3`,
`ALLM.edit_image/4`, `ALLM.image_variations/3`) take the same engine
and return image responses.

This guide covers what each entry point does, the parallel adapter
slot, OpenAI vs Gemini coverage, and the `FakeImages` adapter for
deterministic testing.

## Three operations

| Operation | Function | What it does |
|---|---|---|
| Generate | `ALLM.generate_image/3` | Produces a new image from a text prompt |
| Edit (inpaint) | `ALLM.edit_image/4` | Modifies an existing image, optionally masked |
| Variations | `ALLM.image_variations/3` | Produces visual variations of an existing image |

Each returns `{:ok, %ALLM.ImageResponse{}}` with `:images` (list of
`%ALLM.Image{}`) and `:usage` (provider-reported counts).

## The image-adapter engine slot

An engine has two adapter slots: `:adapter` for chat and
`:image_adapter` for images. Set whichever you need:

```elixir
engine = ALLM.Engine.new(
  adapter: ALLM.Providers.OpenAI,             # for chat, optional here
  image_adapter: ALLM.Providers.OpenAI.Images,
  image_default_model: "dall-e-2"
)
```

If you only generate images (no chat), the `:adapter` slot can stay
unset.

## Generating an image

    iex> engine = ALLM.Engine.new(
    ...>   image_adapter: ALLM.Providers.FakeImages,
    ...>   image_adapter_opts: [
    ...>     scripts: [[{:ok, %{
    ...>       images: [%ALLM.Image{source: {:bytes, <<137, 80, 78, 71>>}, mime_type: "image/png"}]
    ...>     }}]]
    ...>   ]
    ...> )
    iex> {:ok, %ALLM.ImageResponse{images: [%ALLM.Image{} = img]}} =
    ...>   ALLM.generate_image(engine, "a watercolor kestrel")
    iex> img.mime_type
    "image/png"

`ALLM.generate_image/3` accepts opts:

* `:model` — override the engine's default.
* `:size` — `"512x512"`, `"1024x1024"`, or a `{w, h}` tuple. Provider
  capabilities differ; OpenAI's `dall-e-2` only supports `256×256`,
  `512×512`, and `1024×1024`.
* `:n` — number of images to generate.
* `:response_format` — `:url` (default for OpenAI 1.x) or `:b64_json`
  (default for newer models).

## Editing an image (inpaint)

`ALLM.edit_image/4` takes the engine, the base image, the prompt, and
optionally a mask:

```elixir
base = File.read!("base.png")
mask = File.read!("mask.png")  # white = paint here, transparent = keep

{:ok, response} = ALLM.edit_image(engine, base, "add a fountain", mask: mask)
```

The base and mask can be raw bytes, a file path
(`{:file, "/path/to/x.png"}`), or an `%ALLM.Image{}`.

## Variations

`ALLM.image_variations/3` produces visual variations of an existing
image — no prompt:

```elixir
{:ok, response} = ALLM.image_variations(engine, base_image, n: 3)
```

OpenAI is the only bundled provider with native variation support, on
`dall-e-2` at 256×256.

## Provider coverage

| Operation | OpenAI | Gemini |
|---|---|---|
| Generate (`generate_image/3`) | yes (`dall-e-2`, `dall-e-3`, `gpt-image-1`) | yes (`gemini-2.5-flash-image-preview`) |
| Edit (`edit_image/4`) | yes (`dall-e-2`, `gpt-image-1`) | yes |
| Variations (`image_variations/3`) | yes (`dall-e-2` only) | no |

Anthropic does not ship an image adapter — set `:image_adapter` to
OpenAI's or Gemini's even when your chat adapter is Anthropic.

## Materializing the result

A `%ALLM.Image{}` carries a `:source` (either `{:bytes, binary}` or
`{:url, string}`) and a `:mime_type`. To get raw bytes regardless of
source:

```elixir
{:ok, bytes} = ALLM.Image.to_binary(image)
```

This handles the URL fetch transparently if needed.

To write to disk:

```elixir
{:ok, bytes} = ALLM.Image.to_binary(image)
File.write!("output.png", bytes)
```

## Testing with `FakeImages`

`ALLM.Providers.FakeImages` is the canonical test vehicle for image
flows — same idea as `ALLM.Providers.Fake` for chat. Build a scripted
response and assert against it:

    iex> engine = ALLM.Engine.new(
    ...>   image_adapter: ALLM.Providers.FakeImages,
    ...>   image_adapter_opts: [
    ...>     scripts: [[{:ok, %{
    ...>       images: [
    ...>         %ALLM.Image{source: {:bytes, <<137, 80, 78, 71, 0, 0>>}, mime_type: "image/png"}
    ...>       ]
    ...>     }}]]
    ...>   ]
    ...> )
    iex> {:ok, %ALLM.ImageResponse{images: images}} =
    ...>   ALLM.generate_image(engine, "anything")
    iex> length(images)
    1

Fake replies are deterministic, async-test-safe (per-process cursor),
and require no network or API key.

## Common patterns

### Generate + persist

```elixir
{:ok, %ALLM.ImageResponse{images: [image]}} =
  ALLM.generate_image(engine, prompt, size: "1024x1024")

{:ok, bytes} = ALLM.Image.to_binary(image)
File.write!(target_path, bytes)
```

### Edit with progress

`generate_image/3` and friends are non-streaming. Long generations
block until the provider returns the bytes. Set a longer timeout via
the engine's `:request_options` if needed.

### Multi-tenant key resolution

Image-adapter calls go through the same `ALLM.Keys` resolution chain as
chat calls. Pass `:api_key` per-call for BYOK SaaS:

```elixir
ALLM.generate_image(engine, prompt, api_key: tenant.openai_key)
```

## Where to next

* `vision.md` — sending images TO the model, vs generating new ones.
* `examples/10_generate_image.exs` — runnable smoke test.
* `examples/11_edit_image.exs` — inpaint with mask.
* `examples/13_image_variations.exs` — OpenAI-only variation flow.