# onyx
[](https://hex.pm/packages/onyx)
[](https://hexdocs.pm/onyx)
[](LICENSE)
Erlang NIF library for ONNX model inference, powered by [tract](https://github.com/sonos/tract) — a pure Rust ONNX runtime with no external dependencies.
- **Zero external dependencies** — tract is statically linked; a single `priv/onyx.dll` ships in the hex package
- **No Rust toolchain required** — pre-compiled NIF bundled, `rebar3 compile` just works
- **BEAM-safe** — inference runs on dirty CPU schedulers; all Rust panics are caught and returned as Erlang error tuples
- **Session-based API** — `load/1` compiles and optimises the model once, `run/2` executes it repeatedly with zero re-compilation overhead
- **Explicit lifecycle control** — `unload/1` immediately invalidates a session; GC also reclaims sessions automatically
- **Sied-compatible binary format** — tensors are little-endian packed binaries, the same convention used by [sied](https://hex.pm/packages/sied) and [kvex](https://hex.pm/packages/kvex)
## Ecosystem
onyx is part of a pure-Erlang ML stack:
```
sied 0.2.4 — SIMD kernels: POPCNT, dot-product, L2-norm, 1-bit quantization
onyx 0.1.0 — ONNX inference: load any ONNX model, run it on the BEAM
kvex 0.2.1 — Approximate nearest-neighbour index with persistence
```
Typical pipeline: tokenize text externally → **onyx** generates embeddings → **kvex** performs ANN search.
## Installation
```erlang
%% rebar.config
{deps, [{onyx, "0.1.0"}]}.
```
No Rust toolchain required at compile time.
## Quick start
```erlang
%% Load and compile the model (runs on a DirtyIO scheduler, ~100ms–1s)
{ok, Model} = onyx:load("sentence-transformer.onnx"),
%% Inspect what inputs the model expects
#{inputs := Inputs, outputs := Outputs} = Model,
%% Inputs = [{<<"input_ids">>, [1, 32], i32}, {<<"attention_mask">>, [1, 32], i32}]
%% Outputs = [{<<"sentence_embedding">>, [1, 384], f32}]
%% Build input tensors — little-endian packed binaries
IdsBin = << <<Id:32/signed-little>> || Id <- TokenIds >>,
MaskBin = << <<M:32/signed-little>> || M <- AttentionMask >>,
%% Run inference (runs on a DirtyCPU scheduler, ~1ms–100ms)
{ok, #{<<"sentence_embedding">> := {EmbBin, [1, 384], f32}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
%% EmbBin is a 384×4 = 1536-byte little-endian float32 binary
%% Feed directly into kvex — no conversion needed
ok = kvex:add(Index, DocumentId, EmbBin).
```
## API
### `load/1`
```erlang
-spec load(file:filename()) -> {ok, session()} | {error, term()}.
```
Loads an ONNX model from disk, runs tract's optimiser (constant folding, op fusion), and compiles an execution plan. Accepts both binary and charlist paths.
Runs on a **DirtyIO** scheduler — will not block normal BEAM schedulers.
### `run/2`
```erlang
-spec run(session(), #{binary() => tensor()}) ->
{ok, #{binary() => tensor()}} | {error, term()}.
```
Executes one forward pass. `Inputs` is a map of input name to tensor. The map must contain exactly the inputs the model expects (as reported in `session.inputs`). Outputs is a map of output name to tensor.
Runs on a **DirtyCPU** scheduler — will not block normal BEAM schedulers.
### `unload/1`
```erlang
-spec unload(session()) -> ok.
```
Immediately marks the session invalid. Any subsequent `run/2` on this session returns `{error, session_unloaded}`. The underlying model memory is freed when the GC collects the session reference (which may happen slightly later). Calling `unload/1` multiple times is safe.
### `tensor/3`
```erlang
-spec tensor(binary(), [integer()], dtype()) -> tensor().
```
Constructs a tensor from raw parts. Pure Erlang — no NIF call. Validates that `Data` is a binary and `Shape` is a list.
### `to_list/1`
```erlang
-spec to_list(tensor()) -> [number()].
```
Decodes a packed binary tensor to a list of Erlang numbers. Intended for debugging and lightweight post-processing. For hot paths, work with the raw binary directly.
Raises `error({dynamic_shape, Shape})` if the shape contains `-1` (dynamic dimension).
## Types
```erlang
-type dtype() :: f32 | f64 | i32 | i64 | u8.
%% A tensor is a packed little-endian binary with its shape and element type.
%% This is the same binary convention used by sied and kvex.
-type tensor() :: {Data :: binary(), Shape :: [integer()], DType :: dtype()}.
%% input_spec and output_spec describe the model's declared I/O contract.
-type input_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
-type output_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
%% A loaded, compiled model session.
-type session() :: #{
ref := reference(), %% NIF resource handle (ResourceArc)
inputs := [input_spec()], %% model's declared inputs, in order
outputs := [output_spec()] %% model's declared outputs, in order
}.
```
### Tensor binary format
Tensors use the same packed little-endian format as sied:
| dtype | bytes per element | Erlang binary pattern |
|-------|------------------|-----------------------|
| f32 | 4 | `<<V:32/float-little>>` |
| f64 | 8 | `<<V:64/float-little>>` |
| i32 | 4 | `<<V:32/signed-little>>` |
| i64 | 8 | `<<V:64/signed-little>>` |
| u8 | 1 | `<<V:8>>` |
Shape dimensions may be `-1` in output specs to indicate dynamic (batch) dimensions that vary per inference call.
## Error reference
| Error | Cause |
|-------|-------|
| `{error, bad_file}` | File not found or inaccessible |
| `{error, {load_failed, Reason}}` | Invalid ONNX file, unsupported operators, or model compilation failure |
| `{error, {run_failed, Reason}}` | Shape mismatch, byte count mismatch, dtype error, or runtime failure |
| `{error, {input_not_found, Name}}` | A required input name is missing from the inputs map |
| `{error, session_unloaded}` | Session was explicitly unloaded via `unload/1` |
## How it works
### Session compilation
```
onyx:load("model.onnx")
→ NIF [DirtyIO]
→ tract_onnx::onnx().model_for_path(path) % parse ONNX protobuf
→ into_optimized() % constant folding, op fusion
→ into_runnable() % compile execution plan
→ ResourceArc<OnyxSession> % BEAM-managed lifetime
→ {ok, #{ref, inputs, outputs}}
```
`into_optimized()` and `into_runnable()` are the expensive steps (10ms–1s depending on model size). The result is a compiled execution plan held in a `ResourceArc` — a reference-counted Rust object managed by the BEAM GC. When no Erlang terms reference the session, the GC automatically frees the compiled model.
### Inference
```
onyx:run(Session, Inputs)
→ NIF [DirtyCPU]
→ check valid flag (AtomicBool)
→ for each model input (in declaration order):
decode {Binary, Shape, DType} → tract Tensor (validate byte count first)
→ SimplePlan::run(inputs) % execute compiled plan
→ for each output tensor:
encode tract Tensor → {Binary, Shape, DType}
→ {ok, #{name => tensor()}}
```
Input tensors are decoded directly from the Erlang binary payload — no extra heap allocation for the data bytes. The `ResourceArc` keeps the session alive for the duration of `run/2`, even if a concurrent `unload/1` fires mid-inference.
### Scheduler assignment
| Function | Scheduler | Why |
|----------|-----------|-----|
| `load/1` | DirtyIO | Disk read + model compilation: 10ms–1s |
| `run/2` | DirtyCPU | Matrix computation: 1ms–100ms |
| `unload/1` | Normal | Atomic flag flip: nanoseconds |
## Usage with kvex — semantic search pipeline
```erlang
%% Index a corpus of documents
index_documents(Docs) ->
{ok, Model} = onyx:load("all-MiniLM-L6-v2.onnx"),
{ok, Index} = kvex:new(384),
lists:foreach(fun({DocId, Text}) ->
{IdsBin, MaskBin} = tokenize(Text, 32),
{ok, #{<<"sentence_embedding">> := {Emb, _, _}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
ok = kvex:add(Index, DocId, Emb)
end, Docs),
{ok, Model, Index}.
%% Query the index
search(Model, Index, QueryText, K) ->
{IdsBin, MaskBin} = tokenize(QueryText, 32),
{ok, #{<<"sentence_embedding">> := {QueryEmb, _, _}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
kvex:search(Index, QueryEmb, K).
```
## Building from source
Requires a Rust stable toolchain (1.70+).
```bash
git clone https://github.com/roquess/onyx
cd onyx
make build # compiles native/onyx/ and writes priv/onyx.dll
rebar3 ct # run test suite
```
The `Makefile` uses `--manifest-path` so it runs correctly from any working directory.
## Supported ONNX operators
onyx relies on tract's supported operator set. tract 0.21 covers the operators needed by most embedding and classification models, including:
- All arithmetic and activation ops (Add, Mul, Relu, Sigmoid, Tanh, GELU, Softmax, ...)
- Matrix multiplication (MatMul, Gemm)
- Normalisation (LayerNorm, BatchNorm)
- Attention mechanisms (used in transformer models)
- Convolution, pooling
- Reshape, Transpose, Concat, Slice
Models from Hugging Face (exported with `optimum` or `transformers`) and ONNX Model Zoo generally work out of the box. Exotic custom operators, some recurrent layers, and dynamic control flow may not be supported — `load/1` will return `{error, {load_failed, Reason}}` in those cases.
## Links
- Hex.pm: [https://hex.pm/packages/onyx](https://hex.pm/packages/onyx)
- GitHub: [https://github.com/roquess/onyx](https://github.com/roquess/onyx)
- tract (Rust ONNX runtime): [https://github.com/sonos/tract](https://github.com/sonos/tract)
- sied (SIMD NIFs): [https://hex.pm/packages/sied](https://hex.pm/packages/sied)
- kvex (ANN index): [https://hex.pm/packages/kvex](https://hex.pm/packages/kvex)
## License
Apache License 2.0 — see [LICENSE](LICENSE).