# Precompiled NIFs
This document explains how `ex_data_sketch` ships its Rust NIF as a set
of precompiled binary artifacts, why this matters for adoption, and how
the v0.8.0 release pipeline produces those artifacts.
## Why precompiled NIFs matter
A Rust NIF is a `.so` / `.dylib` / `.dll` file that the BEAM dynamically
loads at runtime. Building it requires:
- a working Rust toolchain (`rustc`, `cargo`, the target's standard
library);
- a working C linker (`cc`, `link.exe`, etc.) for system glue;
- network access during `cargo` to download crate dependencies;
- a non-trivial amount of CPU time (~30s on a modern laptop for a clean
release build).
For a typical Elixir application that adds `ex_data_sketch` as a
dependency, this means:
- developers cannot `mix deps.get && mix compile` and have it Just Work
unless they install Rust first;
- CI pipelines need to either bake Rust into the base image or pay the
toolchain install cost on every run;
- Docker layer caching for `mix deps.compile` is invalidated whenever
the dependency tree changes.
The `RustlerPrecompiled` library (and the `rustler_precompiled_action`
GitHub Action) solves this by:
1. Building the NIF on every supported platform at release time and
uploading each result as a GitHub Release asset.
2. At `mix deps.compile` time on the downstream side, downloading the
pre-built `.tar.gz` matching the host's `target-triple + nif-version`
from the GitHub Release URL.
3. Verifying the downloaded artifact's SHA-256 against a checksum file
shipped in the Hex package.
4. Falling back to a source build (via `EX_DATA_SKETCH_BUILD=true`) if
the user explicitly opts in, or if the precompiled artifact is
missing for the host.
The end result for downstream users: a Rust-toolchain-free
`mix deps.get`, with a verifiable supply chain.
## Platform matrix (v0.8.0)
| Target triple | OS / Architecture | Runner | Cross? |
|--------------------------------|--------------------------|-------------------|--------|
| `aarch64-apple-darwin` | macOS 11+ (Apple Silicon)| `macos-14` | no |
| `x86_64-apple-darwin` | macOS 10.15+ (Intel) | `macos-14` | no |
| `x86_64-unknown-linux-gnu` | glibc Linux (x86_64) | `ubuntu-22.04` | no |
| `x86_64-unknown-linux-musl` | musl Linux (Alpine, etc.)| `ubuntu-22.04` | yes |
| `aarch64-unknown-linux-gnu` | glibc Linux (ARM64) | `ubuntu-22.04` | yes |
| `aarch64-unknown-linux-musl` | musl Linux (ARM64) | `ubuntu-22.04` | yes |
| `x86_64-pc-windows-msvc` | Windows 10+ (x86_64) | `windows-2022` | no |
| `aarch64-pc-windows-msvc` | Windows 11 (ARM64) | `windows-2022` | no |
Two NIF API versions are produced per target (2.16 and 2.17), giving
16 artifacts per release.
### Targets explicitly NOT covered (with rationale)
- **FreeBSD / NetBSD / OpenBSD** — GitHub Actions does not provide BSD
runners; cross-compilation to BSD requires a libc shim that `cross-rs`
does not bundle by default. Users on BSD must build from source with
`EX_DATA_SKETCH_BUILD=1`. Volume is low enough to defer to v1.0+.
- **`riscv64gc-unknown-linux-gnu`** — too small a user base to justify
the cross-build complexity. Users build from source.
- **`x86_64-pc-windows-gnu` (MinGW)** — superseded by MSVC. MSVC is the
Microsoft-supported default toolchain and aligns with what Erlang
itself ships.
- **Old macOS Intel (pre-10.15)** — `xxhash-rust` requires recent macOS
SDKs; pre-10.15 is out of warranty from Apple and not in our test
matrix.
## Release pipeline
The release pipeline is `.github/workflows/release.yml`. It runs on
every `git tag v*` push and has three jobs that execute in sequence:
### 1. `build_release`
Matrix-builds the NIF for all 8 targets × 2 NIF versions = 16 jobs.
Each job:
1. Checks out the repo at the tagged commit.
2. Installs the Rust toolchain via `dtolnay/rust-toolchain@stable`.
3. On non-native targets, installs the cross-compile target via
`rustup target add`.
4. Builds the NIF via `philss/rustler-precompiled-action@v1.1.4`,
which under the hood:
- runs `cargo build --release --target <triple>`;
- optionally invokes `cross` for Linux musl / ARM64 targets;
- packages the resulting `.so` / `.dylib` / `.dll` into
`libex_data_sketch_nif-v<VERSION>-nif-<NIF>-<TRIPLE>.tar.gz`;
- emits the file name and path as action outputs.
5. Uploads the tarball as a GitHub Actions artifact.
### 2. `release`
Downloads all 16 build artifacts, flattens them into a single `nifs/`
directory, and creates a GitHub Release with all tarballs attached.
The release notes are auto-generated by `softprops/action-gh-release`.
### 3. `publish_hex`
Once the release exists:
1. Checks out the repo.
2. Installs Elixir + Erlang + Rust (the Rust toolchain is needed only
to satisfy `rustler_precompiled`'s compile-time validation, NOT to
build the NIF — `EX_DATA_SKETCH_BUILD=true` would build but is set
here only to skip the artifact download check during checksum
generation).
3. Runs `mix rustler_precompiled.download ExDataSketch.Nif --all --print`
which fetches every artifact from the just-published GitHub Release
and writes the SHA-256 checksums to `checksum-Elixir.ExDataSketch.Nif.exs`.
4. Runs `mix hex.publish --yes`, which uploads the Hex package
including the now-populated checksum file.
The end result is a Hex package that, when installed by a downstream
project on any of the 16 supported `(target, NIF)` combinations, will:
- read the `checksum-Elixir.ExDataSketch.Nif.exs` map at compile time;
- look up the SHA-256 for the host's triple + NIF version;
- download the matching `.tar.gz` from the GitHub Release;
- verify the SHA-256 against the checksum file;
- extract the `.so` / `.dylib` / `.dll` into `priv/native/`;
- load it via `:erlang.load_nif/2` at module load time.
## Source-compile fallback
The `RustlerPrecompiled` setup in `lib/ex_data_sketch/nif.ex` is gated
on `EX_DATA_SKETCH_SKIP_NIF` at compile time:
```elixir
unless System.get_env("EX_DATA_SKETCH_SKIP_NIF") in ["1", "true"] do
use RustlerPrecompiled,
otp_app: :ex_data_sketch,
crate: "ex_data_sketch_nif",
base_url: "https://github.com/thanos/ex_data_sketch/releases/download/v#{version}",
version: version,
nif_versions: ["2.16", "2.17"],
targets: [...]
end
```
Two compile-time env vars influence this:
- **`EX_DATA_SKETCH_SKIP_NIF=true`** — skips `use RustlerPrecompiled`
entirely. The NIF stubs (`def xxhash3_64_nif(...), do:
:erlang.nif_error(:not_loaded)`) are the only code that gets loaded.
Any call into the NIF raises `:erlang.nif_error(:not_loaded)`.
`ExDataSketch.Hash.nif_available?/0` correctly returns `false`.
- **`EX_DATA_SKETCH_BUILD=true`** — see `config/config.exs`. Sets
`config :rustler_precompiled, :force_build, ex_data_sketch: true`.
This causes `RustlerPrecompiled` to invoke `rustler` and build the
NIF from source instead of downloading the precompiled artifact.
The two flags are independent and intentionally so:
- `SKIP_NIF` is for fast iterative test cycles where the user does not
need NIF-accelerated paths (CI's NIF-off matrix lane uses this).
- `BUILD` is for development on a target that has no precompiled
artifact (e.g. FreeBSD, NetBSD), or for verifying that the source
matches the precompiled artifact.
## Validating the contract
The contract between the precompiled setup and the user-facing API is
locked by `test/ex_data_sketch/nif_availability_test.exs`. It asserts:
1. `Hash.nif_available?/0` returns a stable boolean and is cached in
`:persistent_term`.
2. `Hash.default_algorithm/0` is `:xxhash3` when the NIF is loaded and
`:phash2` otherwise.
3. `Hash.algorithm_info/1` `:available?` flag reflects the NIF state
for `:xxhash3` and is `true` for `:murmur3` and `:phash2`.
4. `Backend.Rust.available?/0` mirrors `Hash.nif_available?/0`.
5. `Backend.default/0` is `Pure` unless the application has been
explicitly configured to use the Rust backend — the NIF is never
silently selected as the default.
6. The XXH3 wrapper raises `ArgumentError` when the NIF is unavailable
(rather than silently falling back).
7. The pure-Elixir Murmur3 path works without the NIF.
8. The checksum file exists and is a valid Elixir map.
9. The target list declared in `nif.ex` matches the expected matrix
(developer-facing alignment guard between `nif.ex` and
`release.yml`).
These tests run in both NIF-on and NIF-off CI lanes; the body of each
test branches on `Hash.nif_available?/0`.
## Reproducing the release locally
For maintainers verifying the pipeline:
```sh
# Build for the host's native target, source-compiled.
EX_DATA_SKETCH_BUILD=1 mix compile
# Verify the test suite under both modes. Use the dedicated aliases
# so the per-env rustler_precompiled state is reset automatically.
EX_DATA_SKETCH_BUILD=1 mix test.nif_on
EX_DATA_SKETCH_SKIP_NIF=true mix test.nif_off
# Dry-run a cross-build (Linux musl from macOS, using `cross`).
cd native/ex_data_sketch_nif
cross build --release --target aarch64-unknown-linux-musl
```
The full 16-artifact release matrix can only be exercised on GitHub
Actions because some targets (Apple Silicon, Windows ARM64) cannot be
cross-compiled to from a Linux runner.
### Why two aliases?
The `force_build: true / false` value in `config/config.exs` is read by
`rustler_precompiled` as a **compile-time** setting (it determines
whether to bake in the precompiled-download logic or the source-build
logic). When a maintainer flips `EX_DATA_SKETCH_BUILD` between local
runs, the runtime value disagrees with the previously-compiled
`_build/<env>/lib/rustler_precompiled/ebin/` state and the BEAM aborts
startup with:
> the application :rustler_precompiled has a different value set for
> path [:ex_data_sketch] inside key :force_build during runtime
> compared to compile time
The `test.nif_on` and `test.nif_off` aliases avoid this by running
`mix deps.clean rustler_precompiled --build` before `mix test`. CI sets
the env once per job and does not flip modes, so it does not need them.
## Failure modes and recovery
### "Precompiled NIF download failed"
Caused by:
- a target that is not yet in the matrix (most likely a new platform);
- a checksum mismatch (a corrupted upload, very rare);
- the GitHub Release artifact being deleted or renamed;
- network failure during `mix deps.compile`.
User remedy:
```sh
# Force source compilation.
EX_DATA_SKETCH_BUILD=1 mix deps.compile ex_data_sketch
```
This requires the user to have a working Rust toolchain. If Rust is
not available, the user can also fall back to the pure backend:
```sh
EX_DATA_SKETCH_SKIP_NIF=true mix deps.compile ex_data_sketch
```
…and use the pure-Elixir paths (`:phash2` or `:murmur3` hash strategy
with `Backend.Pure`). The pure paths are ~15× slower than the NIF
paths (see `hll_performance.md`) but correct.
### "Hex publish failed: checksum file empty"
Caused by `mix rustler_precompiled.download --all --print` not finding
the GitHub Release artifacts. The `release` job must complete
successfully before `publish_hex` runs; if a build target failed, the
release will still be created but the checksum-download step will see
a missing artifact for that target.
Maintainer remedy: re-run only the failed `build_release` matrix
entries, then re-run `publish_hex` manually.
### "Stale checksum file in git"
If a developer accidentally commits a populated
`checksum-Elixir.ExDataSketch.Nif.exs` from a local
`mix rustler_precompiled.download`, the next release will overwrite it
in the `publish_hex` step. Pre-release the file should remain `%{}` in
git; the release pipeline owns its content.
## Future work (out of scope for v0.8.0)
- **FreeBSD target.** Requires a FreeBSD GitHub Actions runner (which
GitHub does not provide) or a cross-compile pipeline using
`cross-rs` with a custom FreeBSD libc image. Deferred to v0.10+.
- **NIF 2.18+ support.** Currently we ship 2.16 and 2.17 only. The
next NIF API bump will require adding to the matrix.
- **Reproducible builds.** The current pipeline does not guarantee
byte-identical artifacts across rebuilds. `cargo build` is mostly
deterministic but `rustc` includes timestamps and absolute paths.
Closing this gap requires `--remap-path-prefix` and a frozen
build environment. Out of scope for v0.8.0.
- **SBOM / SLSA provenance.** Generating a Software Bill of Materials
and SLSA Level 3 provenance for each release artifact. The
`actions/attest-build-provenance` action makes this easy; deferred
to a v1.0 hardening pass.
- **Mirror artifacts to a CDN.** Currently all artifacts are served by
GitHub Releases. For high-volume downstream installs, a CDN mirror
(or Hex itself hosting the NIFs) would reduce latency.
## References
- `lib/ex_data_sketch/nif.ex` — the `use RustlerPrecompiled` block.
- `.github/workflows/release.yml` — the release pipeline.
- `checksum-Elixir.ExDataSketch.Nif.exs` — the SHA-256 catalog.
- `mix.exs` `package/0` — the Hex package file list.
- `config/config.exs` — the `force_build` toggle.
- `test/ex_data_sketch/nif_availability_test.exs` — the contract tests.
- [philss/rustler_precompiled](https://github.com/philss/rustler_precompiled)
- [philss/rustler-precompiled-action](https://github.com/philss/rustler-precompiled-action)