guides/segmentation.md

# Image Segmentation

`Image.Segmentation` produces pixel-level masks: which pixels belong to a given object or region.

Two functions cover different use cases:

- `segment/2` — **promptable**: click a point or draw a box to cut out a specific object.
- `segment_panoptic/2` — **class-labeled**: every region in the image gets a label (`person`, `car`, `sky`…).

## Promptable segmentation (SAM 2)

### Segment the centre object

With no prompt, `segment/2` segments whatever is at the centre of the image:

```elixir
iex> image = Image.open!("product_photo.jpg")
iex> %{mask: mask, score: score} = Image.Segmentation.segment(image)
iex> score
0.94
```

### Segment by point

```elixir
iex> %{mask: mask} = Image.Segmentation.segment(image, prompt: {:point, 320, 240})
```

### Segment by bounding box

```elixir
iex> %{mask: mask} = Image.Segmentation.segment(image, prompt: {:box, 100, 50, 200, 300})
```

The box is `{x, y, width, height}` in pixel coordinates of the original image.

### Multiple prompts

Pass a list of `{:point, x, y}` tuples to guide the model toward a specific object when a single point is ambiguous:

```elixir
iex> %{mask: mask} = Image.Segmentation.segment(image,
...>   prompt: [{:point, 320, 240}, {:point, 340, 260}])
```

### Getting all candidate masks

SAM produces three mask candidates for every prompt. Retrieve them all with `multimask: true`:

```elixir
iex> masks = Image.Segmentation.segment(image, multimask: true)
iex> length(masks)
3
iex> hd(masks).score
0.97
```

## Class-labeled segmentation (DETR-panoptic)

`segment_panoptic/2` returns one segment per detected region, each with a class label and a binary mask:

```elixir
iex> street = Image.open!("street.jpg")
iex> segments = Image.Segmentation.segment_panoptic(street)
iex> Enum.map(segments, & {&1.label, Float.round(&1.score, 2)})
[{"person", 0.97}, {"car", 0.93}, {"road", 0.88}, {"sky", 0.85}]
```

Uses 250 COCO panoptic categories covering everyday objects and background regions.

## Composing results with the original image

### Cut out a segmented object

`apply_mask/2` makes the mask the alpha channel — white pixels become opaque, black pixels transparent:

```elixir
iex> %{mask: mask} = Image.Segmentation.segment(image)
iex> {:ok, cutout} = Image.Segmentation.apply_mask(image, mask)
iex> Image.save!(cutout, "cutout.png")
```

### Colour-coded overlay

`compose_overlay/3` draws a colour-coded overlay of all segments:

```elixir
iex> overlay = Image.Segmentation.compose_overlay(street, segments)
iex> Image.save!(overlay, "segmented.jpg")
```

Adjust transparency with `:alpha` (default `0.5`):

```elixir
iex> overlay = Image.Segmentation.compose_overlay(street, segments, alpha: 0.3)
```

## Using a different model

Both `segment/2` and `segment_panoptic/2` accept options to swap models. They are passed per call rather than via app config — neither function uses a long-running serving, so there is no autostart cost to overriding on a single call.

### Promptable (SAM 2)

```elixir
# Use a larger SAM 2 variant for better quality on small or thin objects
iex> Image.Segmentation.segment(image,
...>   prompt: {:point, 320, 240},
...>   repo: "SharpAI/sam2-hiera-small-onnx")
```

`segment/2` accepts:

- `:repo` — any HuggingFace repo containing a SAM 2 ONNX export with separate encoder and decoder files
- `:encoder_file` — encoder filename within the repo (default `"encoder.onnx"`)
- `:decoder_file` — decoder filename within the repo (default `"decoder.onnx"`)

The protocol matches `SharpAI/sam2-hiera-tiny-onnx` (separate encoder/decoder, the standard SAM 2 ONNX export shape). Repos that bundle both into a single file or use a different I/O layout will not work without changes to the wrapper.

### Class-labeled (DETR-panoptic)

```elixir
# Quantized variant — much smaller, some accuracy cost
iex> Image.Segmentation.segment_panoptic(image, model_file: "onnx/model_quantized.onnx")

# A different ONNX-exported DETR-panoptic repo
iex> Image.Segmentation.segment_panoptic(image, repo: "your-org/detr-panoptic-onnx")
```

`segment_panoptic/2` accepts:

- `:repo` — any HuggingFace repo with a DETR-panoptic ONNX export and a `config.json` providing `id2label`
- `:model_file` — ONNX filename within the repo (default `"onnx/model.onnx"`)

Labels are read from the repo's `config.json`. Where that config has placeholder `LABEL_n` entries, the wrapper falls back to the canonical [COCO panoptic taxonomy](https://github.com/cocodataset/panopticapi/blob/master/panoptic_coco_categories.json), so common stuff classes (`sky-other-merged`, `mountain-merged`, `grass-merged`, …) resolve correctly even on repos with incomplete configs.

### Pre-downloading

To populate the cache before first use:

```bash
mix image_vision.download_models --segment
```

This fetches the configured defaults. For non-default repos, the cache populates on first call to `segment/2` or `segment_panoptic/2`.

## Dependencies

Segmentation requires `:ortex`. Add to `mix.exs`:

```elixir
{:ortex, "~> 0.1"}
```

Model weights (~150 MB for SAM 2, ~175 MB for DETR) are downloaded on first call and cached. Configure the cache directory with:

```elixir
config :image_vision, :cache_dir, "/path/to/cache"
```