# Changelog
## 0.1.0 (2026-05-20)
First Hex release.
### Added — `Nx.Vulkan.VulkanoBackend` (pure-Rust path)
A new `Nx.Backend` implementation built on the [vulkano](https://github.com/vulkano-rs/vulkano)
Rust wrapper around Vulkan compute. Sibling to the existing
`Nx.Vulkan.Backend` (C++ spirit-backed); they share the SPV
catalog under `priv/shaders/` and the chain-shader synthesis
pipeline.
**Why a second backend.** A use-after-free in the C++ FFI layer
crashed the live trader three minutes after every restart —
`Nx.Vulkan.Native.byte_size` raising `:badarg` on a stale
`VkBuf*` pointer that had outlived its referent. Vulkano's
`Arc<Buffer>` ownership makes that bug class structurally
impossible: a `Subbuffer<u8>` cannot outlive its parent at the
Rust type level.
**What it ships.**
- Buffer lifecycle: `buf_upload`, `buf_alloc`, `buf_download`,
`buf_byte_size`, `buf_upload_into`. Each wraps a vulkano
`Subbuffer<[u8]>` in a Rustler resource; the BEAM GC's drop
triggers vulkano's `vkDestroyBuffer + vkFreeMemory` chain.
- Compute ops (24 native through specialised SPVs):
- **Elementwise binary** (f32 + f64): add, subtract, multiply,
divide, pow, max, min.
- **Elementwise unary** (f32 + f64): exp, log, sqrt, abs,
negate, sigmoid, tanh, floor, ceil, sign.
- **Reductions** (f32 + f64): sum, reduce_max, reduce_min;
all-axes, leading-axis, trailing-axis.
- **Shape / movement**: reshape (zero-copy), squeeze
(zero-copy), 2D transpose.
- **Matmul**: rank-2 × rank-2, f32 only.
- Host-fallback callbacks (correctness first; perf-native
shaders pending): slice, as_type, comparison ops (equal,
not_equal, less, less_equal, greater, greater_equal), select,
all, any, dot (non-standard axis configs), `block/4`
(routes `Nx.Block.LinAlg.SVD/QR/Cholesky/solve` through
`BinaryBackend`).
- Pipeline cache keyed by `(spv_path, op_code)`. First call
builds the layout + pipeline; subsequent calls reuse them.
Required for long-running workloads (without it, vulkano's
`StandardDescriptorSetAllocator` creates a fresh
`DescriptorPool` per unique layout identity, eventually
exhausting driver limits on FreeBSD).
**Validated workloads.**
- **Axon training step**: Dense → sigmoid → Dense + MSE +
`Nx.Defn.value_and_grad`. Forward loss matches `BinaryBackend`
byte-identical; gradient sum agrees to 1e-8. 100-step SGD
trajectory matches at every step within 2e-6; final loss
agrees to 4e-7 with both backends converging by 350×.
- **eXMC regime model log-posterior**: 8 free RVs, softmax-mixture
custom likelihood over 200 observations. Matches `BinaryBackend`
to 1e-7 at f64 precision. Roughly 2× faster than the C++ path
on the bench target (GT 650M, FreeBSD 15.0).
- **Scholar linear regression** (normal equation + SVD):
coefficients match `BinaryBackend` to 2e-6 on synthetic
regression. SVD via host-fallback `block/4`.
**Autograd.** No backward callbacks were written. `Nx.Defn.grad`
is a graph transformation that expresses backward ops in terms
of forward ops — forward op coverage is therefore gradient
coverage when running through `Nx.Defn.Evaluator`. Validated
end-to-end via the Axon training step.
### Added — Mission II chain-shader synthesis
`Exmc.NUTS.CustomSynth`-style runtime synthesis of multi-RV
HMC/NUTS chain shaders. Take a multi-RV IR with a Custom
likelihood, trace via `Nx.Defn`, emit GLSL, compile to SPIR-V,
content-address cache, dispatch. Validated on the regime model
(8 RVs + 200-obs softmax-mixture) on GT 650M at 60 ms per K=32
leapfrog dispatch — 8.3× under the 500 ms/sample budget.
### Existing — `Nx.Vulkan.Backend` (C++ spirit path)
The legacy backend stays in this release. It runs the chain-shader
synthesis pipeline and the Mission II dispatch. The stale-handle
bug class that motivated the migration is still present; the
recommended path forward is `VulkanoBackend` for general Nx
work plus the spirit-backed chain dispatch (or vulkano's
chain-shader dispatch via `Nx.Vulkan.NativeV.leapfrog_chain_synth`)
for HMC.
### Build notes
- Rust 1.85 pinned via `rust-toolchain.toml`. See the comment in
that file for the upstream rustler reason.
- Vulkan SDK + `glslangValidator` required:
- Linux: `apt install libvulkan-dev vulkan-tools glslang-tools`
- FreeBSD: `pkg install vulkan-loader vulkan-headers vulkan-tools glslang shaderc`
- vulkano 0.34 builds in ~30s on Linux, ~3:18 on FreeBSD 15.0.
### What's missing (the honest queue)
- Persistent buffer pool (per-call allocation works but costs
a millisecond per dispatch).
- f64 matmul shader (regime model's `Nx.dot` falls back to host).
- Native linalg shaders (SVD, QR, Cholesky, solve) — Scholar
currently routes these through host.
- Custom `Nx.Defn` compiler — today we run through
`Nx.Defn.Evaluator` op-by-op; whole-graph optimisation is
EXLA-style work.
- Convolutions, FFTs, sort, scatter — the long tail of Nx ops.
- R4 live-trader cutover — the production trader has not been
switched to `VulkanoBackend` yet.
### Links
- Blog: [The Backend That Didn't Need to Know](http://www.dataalienist.com/blog-backend-didnt-need-to-know.html)
- Roadmap: [`docs/VULKANO_BACKEND_ROADMAP.md`](docs/VULKANO_BACKEND_ROADMAP.md)
- 10-minute intro: [`livebooks/intro_10min.livemd`](livebooks/intro_10min.livemd)
- Examples: [`examples/axon_training_loop.exs`](examples/axon_training_loop.exs),
[`examples/full_bench.exs`](examples/full_bench.exs)