README.md

# onyx

[![Hex.pm](https://img.shields.io/hexpm/v/onyx.svg)](https://hex.pm/packages/onyx)
[![Hex Docs](https://img.shields.io/badge/hex-docs-blue.svg)](https://hexdocs.pm/onyx)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

Erlang NIF library for ONNX model inference, powered by [tract](https://github.com/sonos/tract) — a pure Rust ONNX runtime with no external dependencies.

- **Zero external dependencies** — tract is statically linked; a single `priv/onyx.dll` ships in the hex package
- **No Rust toolchain required** — pre-compiled NIF bundled, `rebar3 compile` just works
- **BEAM-safe** — inference runs on dirty CPU schedulers; all Rust panics are caught and returned as Erlang error tuples
- **Session-based API** — `load/1` compiles and optimises the model once, `run/2` executes it repeatedly with zero re-compilation overhead
- **Explicit lifecycle control** — `unload/1` immediately invalidates a session; GC also reclaims sessions automatically
- **Sied-compatible binary format** — tensors are little-endian packed binaries, the same convention used by [sied](https://hex.pm/packages/sied) and [kvex](https://hex.pm/packages/kvex)

## Ecosystem

onyx is part of a pure-Erlang ML stack:

```
sied  0.2.4  — SIMD kernels: POPCNT, dot-product, L2-norm, 1-bit quantization
onyx  0.1.0  — ONNX inference: load any ONNX model, run it on the BEAM
kvex  0.2.1  — Approximate nearest-neighbour index with persistence
```

Typical pipeline: tokenize text externally → **onyx** generates embeddings → **kvex** performs ANN search.

## Installation

```erlang
%% rebar.config
{deps, [{onyx, "0.1.0"}]}.
```

No Rust toolchain required at compile time.

## Quick start

```erlang
%% Load and compile the model (runs on a DirtyIO scheduler, ~100ms–1s)
{ok, Model} = onyx:load("sentence-transformer.onnx"),

%% Inspect what inputs the model expects
#{inputs := Inputs, outputs := Outputs} = Model,
%% Inputs  = [{<<"input_ids">>, [1, 32], i32}, {<<"attention_mask">>, [1, 32], i32}]
%% Outputs = [{<<"sentence_embedding">>, [1, 384], f32}]

%% Build input tensors — little-endian packed binaries
IdsBin  = << <<Id:32/signed-little>>  || Id  <- TokenIds >>,
MaskBin = << <<M:32/signed-little>>   || M   <- AttentionMask >>,

%% Run inference (runs on a DirtyCPU scheduler, ~1ms–100ms)
{ok, #{<<"sentence_embedding">> := {EmbBin, [1, 384], f32}}} =
    onyx:run(Model, #{
        <<"input_ids">>      => {IdsBin,  [1, 32], i32},
        <<"attention_mask">> => {MaskBin, [1, 32], i32}
    }),

%% EmbBin is a 384×4 = 1536-byte little-endian float32 binary
%% Feed directly into kvex — no conversion needed
ok = kvex:add(Index, DocumentId, EmbBin).
```

## API

### `load/1`

```erlang
-spec load(file:filename()) -> {ok, session()} | {error, term()}.
```

Loads an ONNX model from disk, runs tract's optimiser (constant folding, op fusion), and compiles an execution plan. Accepts both binary and charlist paths.

Runs on a **DirtyIO** scheduler — will not block normal BEAM schedulers.

### `run/2`

```erlang
-spec run(session(), #{binary() => tensor()}) ->
        {ok, #{binary() => tensor()}} | {error, term()}.
```

Executes one forward pass. `Inputs` is a map of input name to tensor. The map must contain exactly the inputs the model expects (as reported in `session.inputs`). Outputs is a map of output name to tensor.

Runs on a **DirtyCPU** scheduler — will not block normal BEAM schedulers.

### `unload/1`

```erlang
-spec unload(session()) -> ok.
```

Immediately marks the session invalid. Any subsequent `run/2` on this session returns `{error, session_unloaded}`. The underlying model memory is freed when the GC collects the session reference (which may happen slightly later). Calling `unload/1` multiple times is safe.

### `tensor/3`

```erlang
-spec tensor(binary(), [integer()], dtype()) -> tensor().
```

Constructs a tensor from raw parts. Pure Erlang — no NIF call. Validates that `Data` is a binary and `Shape` is a list.

### `to_list/1`

```erlang
-spec to_list(tensor()) -> [number()].
```

Decodes a packed binary tensor to a list of Erlang numbers. Intended for debugging and lightweight post-processing. For hot paths, work with the raw binary directly.

Raises `error({dynamic_shape, Shape})` if the shape contains `-1` (dynamic dimension).

## Types

```erlang
-type dtype() :: f32 | f64 | i32 | i64 | u8.

%% A tensor is a packed little-endian binary with its shape and element type.
%% This is the same binary convention used by sied and kvex.
-type tensor() :: {Data :: binary(), Shape :: [integer()], DType :: dtype()}.

%% input_spec and output_spec describe the model's declared I/O contract.
-type input_spec()  :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
-type output_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.

%% A loaded, compiled model session.
-type session() :: #{
    ref     := reference(),        %% NIF resource handle (ResourceArc)
    inputs  := [input_spec()],     %% model's declared inputs, in order
    outputs := [output_spec()]     %% model's declared outputs, in order
}.
```

### Tensor binary format

Tensors use the same packed little-endian format as sied:

| dtype | bytes per element | Erlang binary pattern |
|-------|------------------|-----------------------|
| f32   | 4                | `<<V:32/float-little>>` |
| f64   | 8                | `<<V:64/float-little>>` |
| i32   | 4                | `<<V:32/signed-little>>` |
| i64   | 8                | `<<V:64/signed-little>>` |
| u8    | 1                | `<<V:8>>` |

Shape dimensions may be `-1` in output specs to indicate dynamic (batch) dimensions that vary per inference call.

## Error reference

| Error | Cause |
|-------|-------|
| `{error, bad_file}` | File not found or inaccessible |
| `{error, {load_failed, Reason}}` | Invalid ONNX file, unsupported operators, or model compilation failure |
| `{error, {run_failed, Reason}}` | Shape mismatch, byte count mismatch, dtype error, or runtime failure |
| `{error, {input_not_found, Name}}` | A required input name is missing from the inputs map |
| `{error, session_unloaded}` | Session was explicitly unloaded via `unload/1` |

## How it works

### Session compilation

```
onyx:load("model.onnx")
  → NIF [DirtyIO]
  → tract_onnx::onnx().model_for_path(path)   % parse ONNX protobuf
  → into_optimized()                            % constant folding, op fusion
  → into_runnable()                             % compile execution plan
  → ResourceArc<OnyxSession>                   % BEAM-managed lifetime
  → {ok, #{ref, inputs, outputs}}
```

`into_optimized()` and `into_runnable()` are the expensive steps (10ms–1s depending on model size). The result is a compiled execution plan held in a `ResourceArc` — a reference-counted Rust object managed by the BEAM GC. When no Erlang terms reference the session, the GC automatically frees the compiled model.

### Inference

```
onyx:run(Session, Inputs)
  → NIF [DirtyCPU]
  → check valid flag (AtomicBool)
  → for each model input (in declaration order):
      decode {Binary, Shape, DType} → tract Tensor (validate byte count first)
  → SimplePlan::run(inputs)                     % execute compiled plan
  → for each output tensor:
      encode tract Tensor → {Binary, Shape, DType}
  → {ok, #{name => tensor()}}
```

Input tensors are decoded directly from the Erlang binary payload — no extra heap allocation for the data bytes. The `ResourceArc` keeps the session alive for the duration of `run/2`, even if a concurrent `unload/1` fires mid-inference.

### Scheduler assignment

| Function | Scheduler | Why |
|----------|-----------|-----|
| `load/1` | DirtyIO | Disk read + model compilation: 10ms–1s |
| `run/2` | DirtyCPU | Matrix computation: 1ms–100ms |
| `unload/1` | Normal | Atomic flag flip: nanoseconds |

## Usage with kvex — semantic search pipeline

```erlang
%% Index a corpus of documents
index_documents(Docs) ->
    {ok, Model} = onyx:load("all-MiniLM-L6-v2.onnx"),
    {ok, Index} = kvex:new(384),
    lists:foreach(fun({DocId, Text}) ->
        {IdsBin, MaskBin} = tokenize(Text, 32),
        {ok, #{<<"sentence_embedding">> := {Emb, _, _}}} =
            onyx:run(Model, #{
                <<"input_ids">>      => {IdsBin,  [1, 32], i32},
                <<"attention_mask">> => {MaskBin, [1, 32], i32}
            }),
        ok = kvex:add(Index, DocId, Emb)
    end, Docs),
    {ok, Model, Index}.

%% Query the index
search(Model, Index, QueryText, K) ->
    {IdsBin, MaskBin} = tokenize(QueryText, 32),
    {ok, #{<<"sentence_embedding">> := {QueryEmb, _, _}}} =
        onyx:run(Model, #{
            <<"input_ids">>      => {IdsBin,  [1, 32], i32},
            <<"attention_mask">> => {MaskBin, [1, 32], i32}
        }),
    kvex:search(Index, QueryEmb, K).
```

## Building from source

Requires a Rust stable toolchain (1.70+).

```bash
git clone https://github.com/roquess/onyx
cd onyx
make build      # compiles native/onyx/ and writes priv/onyx.dll
rebar3 ct       # run test suite
```

The `Makefile` uses `--manifest-path` so it runs correctly from any working directory.

## Supported ONNX operators

onyx relies on tract's supported operator set. tract 0.21 covers the operators needed by most embedding and classification models, including:

- All arithmetic and activation ops (Add, Mul, Relu, Sigmoid, Tanh, GELU, Softmax, ...)
- Matrix multiplication (MatMul, Gemm)
- Normalisation (LayerNorm, BatchNorm)
- Attention mechanisms (used in transformer models)
- Convolution, pooling
- Reshape, Transpose, Concat, Slice

Models from Hugging Face (exported with `optimum` or `transformers`) and ONNX Model Zoo generally work out of the box. Exotic custom operators, some recurrent layers, and dynamic control flow may not be supported — `load/1` will return `{error, {load_failed, Reason}}` in those cases.

## Links

- Hex.pm: [https://hex.pm/packages/onyx](https://hex.pm/packages/onyx)
- GitHub: [https://github.com/roquess/onyx](https://github.com/roquess/onyx)
- tract (Rust ONNX runtime): [https://github.com/sonos/tract](https://github.com/sonos/tract)
- sied (SIMD NIFs): [https://hex.pm/packages/sied](https://hex.pm/packages/sied)
- kvex (ANN index): [https://hex.pm/packages/kvex](https://hex.pm/packages/kvex)

## License

Apache License 2.0 — see [LICENSE](LICENSE).