guides/detection.md

# Object Detection

`Image.Detection` answers "where are the objects in this image and what are they?". It returns a list of bounding boxes with class labels and confidence scores.

## Basic detection

```elixir
iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.97, box: {120, 45, 60, 180}}
```

Each detection is a map with:
- `:label` — one of 80 COCO class names (`person`, `bicycle`, `car`, `dog`, …)
- `:score` — confidence score in `[0.0, 1.0]`
- `:box` — `{x, y, width, height}` in pixel coordinates of the original image

Results are sorted by descending confidence.

## Filtering by confidence

The default minimum score is `0.5`. Raise it to get only high-confidence detections:

```elixir
iex> Image.Detection.detect(street, min_score: 0.8)
[%{label: "person", score: 0.94, box: {120, 45, 60, 180}}, ...]
```

## Drawing bounding boxes

`draw_bbox_with_labels/2` annotates the image:

```elixir
iex> detections = Image.Detection.detect(street)
iex> annotated = Image.Detection.draw_bbox_with_labels(detections, street)
iex> Image.save!(annotated, "annotated.jpg")
```

It accepts the same image that `detect/2` was called on. Pipeline:

```elixir
iex> street
...> |> Image.Detection.detect()
...> |> Image.Detection.draw_bbox_with_labels(street)
...> |> Image.save!("annotated.jpg")
```

## Available classes

The default RT-DETR model is trained on COCO 80 classes:

```elixir
iex> Image.Detection.classes()
["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", ...]
```

## Using a different model

`detect/2` accepts `:repo` and `:filename` to swap in any RT-DETR-family ONNX model from HuggingFace:

```elixir
# Smaller R18 backbone (~80 MB) — faster, slightly less accurate
iex> Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")

# Quantized variant of the default (~45 MB INT8) — much smaller download, some accuracy cost
iex> Image.Detection.detect(image, filename: "onnx/model_quantized.onnx")
```

For one-off use, pass options per call. To make a non-default model the project default, you can wrap the call:

```elixir
defp detect(image), do: Image.Detection.detect(image, repo: "onnx-community/rtdetr_r18vd")
```

To pre-download a model into the cache:

```bash
mix image_vision.download_models --detect
```

(The download task fetches the configured default; if you've changed the repo for a single call site, the cache will populate on first use.)

### Caveat: COCO 80 labels are hardcoded

`detect/2` maps class indices to label strings using a baked-in COCO 80 list ([detection.ex](https://github.com/elixir-image/image_vision/blob/main/lib/detection.ex)). RT-DETR models trained on a different label set (e.g. Open Images, custom domains) will produce indices the wrapper can't translate — labels will be wrong even though boxes and scores are correct. For non-COCO models, use the underlying `Ortex.run/2` directly with the model's own `id2label`.

## Default model

RT-DETR (`onnx-community/rtdetr_r50vd`) is a real-time transformer-based detector that outperforms YOLOv8 on COCO while being **Apache 2.0 licensed** (YOLOv8/11 are AGPL). It is NMS-free — no Non-Maximum Suppression post-processing is needed.

Model weights are downloaded on first call and cached. Configure the cache directory with:

```elixir
config :image_vision, :cache_dir, "/path/to/cache"
```

## Dependencies

Detection requires `:ortex`. Add to `mix.exs`:

```elixir
{:ortex, "~> 0.1"}
```