README.md

# ImageVision

`ImageVision` is a simple, opinionated image vision library for Elixir. It sits alongside the [`image`](https://hex.pm/packages/image) library and answers three questions about any image — **what's in it**, **where are the objects**, and **which pixels belong to which object** — with strong defaults and no ML expertise required.

## Quick start

```elixir
# Classification — what is in this image?
iex> puppy = Image.open!("puppy.jpg")
iex> Image.Classification.labels(puppy)
["Blenheim spaniel"]

# Detection — where are the objects and what are they?
iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.94, box: {120, 45, 60, 180}}

# Draw bounding boxes on the image
iex> Image.Detection.draw_bbox_with_labels(detections, street)

# Segmentation — which pixels belong to which object?
iex> segments = Image.Segmentation.segment_panoptic(street)
iex> Enum.map(segments, & &1.label)
["person", "car", "road", "sky"]

# Colour-coded overlay of all segments
iex> Image.Segmentation.compose_overlay(street, segments)

# Promptable segmentation — isolate the object at a specific point
iex> %{mask: mask} = Image.Segmentation.segment(puppy, prompt: {:point, 320, 240})
iex> {:ok, cutout} = Image.Segmentation.apply_mask(puppy, mask)

# Embedding — 768-dim feature vector for similarity search
iex> Image.Classification.embed(puppy)
#Nx.Tensor<f32[768]>
```

## Installation

Add `:image_vision` to `mix.exs` along with whichever optional ML backends you need:

```elixir
def deps do
  [
    {:image_vision, "~> 0.1"},

    # Required for Image.Classification and Image.Classification.embed/2
    {:bumblebee, "~> 0.6"},
    {:nx, "~> 0.10"},
    {:exla, "~> 0.10"},     # or {:torchx, "~> 0.10"} for Torch backend

    # Required for Image.Detection and Image.Segmentation
    {:ortex, "~> 0.1"}
  ]
end
```

All ML deps are optional — omit any you do not use. The library compiles cleanly without them.

## Prerequisites

For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, **no native toolchain is required**. The libraries used here ship precompiled native binaries for those platforms and `mix deps.get` is all you need.

If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:

* **A Rust toolchain** for `:ortex` (the ONNX runtime wrapper used by detection and segmentation) and for `:tokenizers` (pulled in transitively by `:bumblebee`). Install via [rustup](https://rustup.rs):

  ```bash
  curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  ```

* **A C compiler** for `:vix` (the libvips wrapper used by `:image`). On Linux install `build-essential` (Debian/Ubuntu) or `gcc` (Fedora/RHEL); on macOS install Xcode Command Line Tools (`xcode-select --install`).

* **`libvips`** if you need advanced libvips features beyond what the precompiled NIF includes. On macOS: `brew install vips`. On Linux: your distro's `libvips-dev` / `vips-devel` package. Most users don't need this.

If you see Cargo or `cc` errors during `mix deps.compile`, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.

### Disk space and first-call latency

Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:

| Task | Default model | Size |
|---|---|---|
| Classification | `facebook/convnext-tiny-224` | ~110 MB |
| Detection | `onnx-community/rtdetr_r50vd` | ~175 MB |
| Segmentation (SAM 2) | `SharpAI/sam2-hiera-tiny-onnx` | ~150 MB |
| Segmentation (panoptic) | `Xenova/detr-resnet-50-panoptic` | ~175 MB |

The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.

To pre-download all default models before first use (recommended for production deployments and CI):

```bash
mix image_vision.download_models
```

Pass `--classify`, `--detect`, or `--segment` to limit scope.

### Livebook Desktop

Livebook Desktop launches as a GUI application and **does not inherit your shell's `PATH`**. Tools installed via `rustup`, `mise`, `asdf`, or Homebrew aren't visible to it by default — even if `cargo` works fine in your terminal.

If you hit "cargo: command not found" or similar during `Mix.install` inside Livebook Desktop, create `~/.livebookdesktop.sh` and add the relevant directories to `PATH`. A reasonable starting point:

```bash
# ~/.livebookdesktop.sh

# Rust (rustup)
export PATH="$HOME/.cargo/bin:$PATH"

# Homebrew (Apple Silicon)
export PATH="/opt/homebrew/bin:$PATH"

# mise — uncomment if you use it
# eval "$(mise activate bash)"

# asdf — uncomment if you use it
# . "$HOME/.asdf/asdf.sh"
```

Restart Livebook Desktop after creating this file. See the [Livebook Desktop documentation](https://github.com/livebook-dev/livebook/blob/main/README.md#livebook-desktop) for details.

## Default models

All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.

| Task | Model | License | Size |
|---|---|---|---|
| Classification | [`facebook/convnext-tiny-224`](https://huggingface.co/facebook/convnext-tiny-224) | Apache 2.0 | ~110 MB |
| Embedding | [`facebook/dinov2-base`](https://huggingface.co/facebook/dinov2-base) | Apache 2.0 | ~340 MB |
| Object detection | [`onnx-community/rtdetr_r50vd`](https://huggingface.co/onnx-community/rtdetr_r50vd) | Apache 2.0 | ~175 MB |
| Promptable segmentation | [`SharpAI/sam2-hiera-tiny-onnx`](https://huggingface.co/SharpAI/sam2-hiera-tiny-onnx) | Apache 2.0 | ~150 MB |
| Panoptic segmentation | [`Xenova/detr-resnet-50-panoptic`](https://huggingface.co/Xenova/detr-resnet-50-panoptic) | Apache 2.0 | ~175 MB |

## Configuration

### Model cache

ONNX model weights are cached in a per-user directory by default (`~/.cache/image_vision` on Linux, `~/Library/Caches/image_vision` on macOS). Override in `config/runtime.exs`:

```elixir
config :image_vision, :cache_dir, "/var/lib/my_app/models"
```

### Classification serving

`Image.Classification` runs a supervised [Bumblebee](https://hex.pm/packages/bumblebee) serving. It does not autostart by default. Start it in your application's supervision tree:

```elixir
# application.ex
def start(_type, _args) do
  children = [
    Image.Classification.classifier(),
    Image.Classification.embedder()   # omit if you do not need embeddings
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end
```

Or enable autostart via config so `ImageVision.Supervisor` handles it:

```elixir
# config/runtime.exs
config :image_vision, :classifier, autostart: true
```

To use a different model:

```elixir
config :image_vision, :classifier,
  model: {:hf, "facebook/convnext-large-224-22k-1k"},
  featurizer: {:hf, "facebook/convnext-large-224-22k-1k"},
  autostart: true
```

## Guides

- [Classification](https://github.com/elixir-image/image_vision/blob/v0.1.0/guides/classification.md) — classifying images and computing embeddings
- [Detection](https://github.com/elixir-image/image_vision/blob/v0.1.0/guides/detection.md) — bounding-box object detection
- [Segmentation](https://github.com/elixir-image/image_vision/blob/v0.1.0/guides/segmentation.md) — promptable and panoptic segmentation