README.md

# ExTorch.Vision

TorchVision ops for [ExTorch](https://github.com/andfoy/extorch) -- detection, segmentation, and image I/O operators running on the BEAM.

ExTorch.Vision builds `libtorchvision.so` from source at compile time (against ExTorch's libtorch) and exposes all torchvision C++ operators through ExTorch's generic dispatcher. **No Rust or C++ code in this package** -- everything goes through `ExTorch.Native.dispatch_op/3`.

## Requirements

- [ExTorch](https://github.com/andfoy/extorch) (provides libtorch)
- CMake >= 3.18
- C++17 compiler (gcc >= 7 or clang >= 5)
- CUDA toolkit (optional, for GPU support and NVJPEG)

## Installation

Add `extorch_vision` to your dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:extorch, "~> 0.3.0"},
    {:extorch_vision, "~> 0.1.0"}
  ]
end
```

Then compile -- `libtorchvision.so` is built from source automatically
during `mix compile` (no manual steps):

```bash
mix deps.get
mix compile
```

## Usage

```elixir
# Lazy initialization -- loads libtorchvision.so on first call
# or call ExTorch.Vision.setup!() explicitly

# Non-maximum suppression
boxes = ExTorch.tensor([[0.0, 0.0, 10.0, 10.0], [0.5, 0.5, 10.5, 10.5], [20.0, 20.0, 30.0, 30.0]])
scores = ExTorch.tensor([0.9, 0.8, 0.7])
keep = ExTorch.Vision.nms(boxes, scores, 0.5)

# ROI Align (detection models)
features = ExTorch.rand({1, 256, 14, 14})
rois = ExTorch.tensor([[0.0, 0.0, 0.0, 7.0, 7.0]])
pooled = ExTorch.Vision.roi_align(features, rois, 1.0, 7, 7)

# Deformable Convolution v2
input = ExTorch.rand({1, 3, 8, 8})
weight = ExTorch.rand({8, 3, 3, 3})
offset = ExTorch.zeros({1, 18, 6, 6})
mask = ExTorch.ones({1, 9, 6, 6})
bias = ExTorch.zeros({8})
out = ExTorch.Vision.deform_conv2d(input, weight, offset, mask, bias, 1, 1, 0, 0)

# Image I/O -- encode/decode without leaving the BEAM
image = ExTorch.randint(0, 255, {3, 224, 224}, dtype: :uint8)
png_bytes = ExTorch.Vision.encode_png(image)
decoded = ExTorch.Vision.decode_png(png_bytes)

# GPU-accelerated JPEG decode (requires NVJPEG)
jpeg_data = ExTorch.Vision.encode_jpeg(image)
gpu_images = ExTorch.Vision.decode_jpegs_cuda([jpeg_data], 0, :cuda)
```

## Available operators

### Detection / segmentation

| Function | Description |
|---|---|
| `nms/3` | Non-maximum suppression |
| `roi_align/7` | Region of Interest Align (bilinear) |
| `roi_pool/5` | Region of Interest Pooling (max) |
| `ps_roi_align/6` | Position-sensitive ROI Align (R-FCN) |
| `ps_roi_pool/5` | Position-sensitive ROI Pooling |
| `deform_conv2d/14` | Deformable Convolution v2 |

### Image I/O

| Function | Description |
|---|---|
| `decode_jpeg/3` | Decode JPEG from uint8 tensor |
| `encode_jpeg/2` | Encode to JPEG bytes |
| `decode_png/3` | Decode PNG from uint8 tensor |
| `encode_png/2` | Encode to PNG bytes |
| `decode_webp/2` | Decode WebP |
| `decode_gif/1` | Decode GIF (animated supported) |
| `decode_image/3` | Auto-detect format and decode |
| `decode_jpegs_cuda/3` | Batch JPEG decode on GPU (NVJPEG) |
| `encode_jpegs_cuda/2` | Batch JPEG encode on GPU |

## ExTorch.Export integration

All ops are automatically registered with `ExTorch.Export.OpRegistry` when `setup!/0` is called. This means exported PyTorch models that use torchvision operators (e.g., Faster R-CNN with `torchvision::roi_align` and `torchvision::nms`) can be loaded and run via `ExTorch.Export.forward/2` without any additional configuration:

```elixir
ExTorch.Vision.setup!()

model = ExTorch.Export.load("faster_rcnn.pt2", device: :cuda)
output = ExTorch.Export.forward(model, [input_tensor])
```

## How it works

ExTorch.Vision contains **zero C++ or Rust code**. It works by:

1. Building `libtorchvision.so` from source via CMake (at `mix torchvision.build` time)
2. Loading it at runtime via `ExTorch.Native.load_torch_library/1` (which calls `dlopen`)
3. TorchVision registers its ops with PyTorch's `c10::Dispatcher` via `TORCH_LIBRARY` blocks
4. Elixir calls ops through `ExTorch.Native.dispatch_op/3`, which invokes the dispatcher

This architecture means any future torchvision ops are automatically available without code changes -- just rebuild `libtorchvision.so`.

## Configuration

### Library path override

Skip the CMake build by pointing to a pre-built `libtorchvision.so`:

```elixir
# In config/config.exs
config :extorch_vision, library_path: "/path/to/libtorchvision.so"
```

Or via environment variable:

```bash
TORCHVISION_LIB_PATH=/path/to/libtorchvision.so mix compile
```

### Local development with extorch

By default, `extorch_vision` pulls ExTorch from Hex. For local development
against a checkout of ExTorch, set the `EXTORCH_PATH` environment variable:

```bash
# Point to your local extorch checkout
export EXTORCH_PATH=../extorch

mix deps.get
mix test
```

This overrides the Hex dependency with a local path dependency, so changes
to ExTorch are picked up immediately without publishing.

### Force rebuild

To rebuild `libtorchvision.so` from scratch (e.g., after upgrading CUDA
or switching libtorch versions):

```bash
mix torchvision.build --force
```

## License

MIT