# ExTorch.Vision
TorchVision ops for [ExTorch](https://github.com/andfoy/extorch) -- detection, segmentation, and image I/O operators running on the BEAM.
ExTorch.Vision builds `libtorchvision.so` from source at compile time (against ExTorch's libtorch) and exposes all torchvision C++ operators through ExTorch's generic dispatcher. **No Rust or C++ code in this package** -- everything goes through `ExTorch.Native.dispatch_op/3`.
## Requirements
- [ExTorch](https://github.com/andfoy/extorch) (provides libtorch)
- CMake >= 3.18
- C++17 compiler (gcc >= 7 or clang >= 5)
- CUDA toolkit (optional, for GPU support and NVJPEG)
## Installation
Add `extorch_vision` to your dependencies in `mix.exs`:
```elixir
def deps do
[
{:extorch, "~> 0.3.0"},
{:extorch_vision, "~> 0.1.0"}
]
end
```
Then compile -- `libtorchvision.so` is built from source automatically
during `mix compile` (no manual steps):
```bash
mix deps.get
mix compile
```
## Usage
```elixir
# Lazy initialization -- loads libtorchvision.so on first call
# or call ExTorch.Vision.setup!() explicitly
# Non-maximum suppression
boxes = ExTorch.tensor([[0.0, 0.0, 10.0, 10.0], [0.5, 0.5, 10.5, 10.5], [20.0, 20.0, 30.0, 30.0]])
scores = ExTorch.tensor([0.9, 0.8, 0.7])
keep = ExTorch.Vision.nms(boxes, scores, 0.5)
# ROI Align (detection models)
features = ExTorch.rand({1, 256, 14, 14})
rois = ExTorch.tensor([[0.0, 0.0, 0.0, 7.0, 7.0]])
pooled = ExTorch.Vision.roi_align(features, rois, 1.0, 7, 7)
# Deformable Convolution v2
input = ExTorch.rand({1, 3, 8, 8})
weight = ExTorch.rand({8, 3, 3, 3})
offset = ExTorch.zeros({1, 18, 6, 6})
mask = ExTorch.ones({1, 9, 6, 6})
bias = ExTorch.zeros({8})
out = ExTorch.Vision.deform_conv2d(input, weight, offset, mask, bias, 1, 1, 0, 0)
# Image I/O -- encode/decode without leaving the BEAM
image = ExTorch.randint(0, 255, {3, 224, 224}, dtype: :uint8)
png_bytes = ExTorch.Vision.encode_png(image)
decoded = ExTorch.Vision.decode_png(png_bytes)
# GPU-accelerated JPEG decode (requires NVJPEG)
jpeg_data = ExTorch.Vision.encode_jpeg(image)
gpu_images = ExTorch.Vision.decode_jpegs_cuda([jpeg_data], 0, :cuda)
```
## Available operators
### Detection / segmentation
| Function | Description |
|---|---|
| `nms/3` | Non-maximum suppression |
| `roi_align/7` | Region of Interest Align (bilinear) |
| `roi_pool/5` | Region of Interest Pooling (max) |
| `ps_roi_align/6` | Position-sensitive ROI Align (R-FCN) |
| `ps_roi_pool/5` | Position-sensitive ROI Pooling |
| `deform_conv2d/14` | Deformable Convolution v2 |
### Image I/O
| Function | Description |
|---|---|
| `decode_jpeg/3` | Decode JPEG from uint8 tensor |
| `encode_jpeg/2` | Encode to JPEG bytes |
| `decode_png/3` | Decode PNG from uint8 tensor |
| `encode_png/2` | Encode to PNG bytes |
| `decode_webp/2` | Decode WebP |
| `decode_gif/1` | Decode GIF (animated supported) |
| `decode_image/3` | Auto-detect format and decode |
| `decode_jpegs_cuda/3` | Batch JPEG decode on GPU (NVJPEG) |
| `encode_jpegs_cuda/2` | Batch JPEG encode on GPU |
## ExTorch.Export integration
All ops are automatically registered with `ExTorch.Export.OpRegistry` when `setup!/0` is called. This means exported PyTorch models that use torchvision operators (e.g., Faster R-CNN with `torchvision::roi_align` and `torchvision::nms`) can be loaded and run via `ExTorch.Export.forward/2` without any additional configuration:
```elixir
ExTorch.Vision.setup!()
model = ExTorch.Export.load("faster_rcnn.pt2", device: :cuda)
output = ExTorch.Export.forward(model, [input_tensor])
```
## How it works
ExTorch.Vision contains **zero C++ or Rust code**. It works by:
1. Building `libtorchvision.so` from source via CMake (at `mix torchvision.build` time)
2. Loading it at runtime via `ExTorch.Native.load_torch_library/1` (which calls `dlopen`)
3. TorchVision registers its ops with PyTorch's `c10::Dispatcher` via `TORCH_LIBRARY` blocks
4. Elixir calls ops through `ExTorch.Native.dispatch_op/3`, which invokes the dispatcher
This architecture means any future torchvision ops are automatically available without code changes -- just rebuild `libtorchvision.so`.
## Configuration
### Library path override
Skip the CMake build by pointing to a pre-built `libtorchvision.so`:
```elixir
# In config/config.exs
config :extorch_vision, library_path: "/path/to/libtorchvision.so"
```
Or via environment variable:
```bash
TORCHVISION_LIB_PATH=/path/to/libtorchvision.so mix compile
```
### Local development with extorch
By default, `extorch_vision` pulls ExTorch from Hex. For local development
against a checkout of ExTorch, set the `EXTORCH_PATH` environment variable:
```bash
# Point to your local extorch checkout
export EXTORCH_PATH=../extorch
mix deps.get
mix test
```
This overrides the Hex dependency with a local path dependency, so changes
to ExTorch are picked up immediately without publishing.
### Force rebuild
To rebuild `libtorchvision.so` from scratch (e.g., after upgrading CUDA
or switching libtorch versions):
```bash
mix torchvision.build --force
```
## License
MIT