# Single Instruction Erlang Data (sied)
High-performance SIMD operations for Erlang through Rust NIFs.
[](https://hex.pm/packages/sied)
[](https://hexdocs.pm/sied)
[](LICENSE)
## Motivation
Erlang excels at building concurrent, fault-tolerant systems, but numerical computations on large datasets can be a bottleneck. Modern CPUs offer SIMD (Single Instruction, Multiple Data) instructions that can process multiple data elements simultaneously, providing significant performance improvements for vectorized operations.
**sied** bridges this gap by exposing SIMD-accelerated mathematical operations to Erlang through a Rust NIF. The name combines **SI**MD with **E**rlang and **D**ata, representing the library's core purpose: bringing efficient data-parallel processing to the Erlang ecosystem.
## Features
- **Automatic SIMD optimization**: Compiler leverages AVX2, AVX-512, NEON, and other instruction sets via [simdeez](https://crates.io/crates/simdeez)
- **Flat-binary search primitives**: `hamming_topk_flat/4` and `dot_product_topk_flat/4` operate on concatenated binary buffers — zero per-element Erlang overhead, designed for ANN search
- **Simple API**: Consistent `{ok, Result} | {error, Reason}` interface throughout
- **Cross-platform**: Works across different CPU architectures with graceful scalar fallback
## Installation
Add to your `rebar.config`:
```erlang
{deps, [{sied, "0.2.0"}]}.
```
Or from GitHub:
```erlang
{deps, [
{sied, {git, "https://github.com/roquess/sied.git", {branch, "main"}}}
]}.
```
## Building
```bash
rebar3 compile
```
**Requirements:**
- Erlang/OTP 24 or later
- Rust 1.70 or later (Cargo included)
The Rust toolchain must be available in your PATH. Visit [rustup.rs](https://rustup.rs) to install Rust.
For maximum performance (full AVX2/AVX-512), build with:
```bash
RUSTFLAGS="-C target-cpu=native" rebar3 as prod compile
```
## API Reference
All functions return `{ok, Result}` on success or `{error, Reason}` on failure.
### Basic Arithmetic
Element-wise operations on f32 or f64 vectors.
```erlang
{ok, Result} = sied:add_f32([1.0, 2.0, 3.0], [4.0, 5.0, 6.0]).
%% Result = [5.0, 7.0, 9.0]
{ok, Result} = sied:multiply_f64([2.0, 3.0], [4.0, 5.0]).
%% Result = [8.0, 15.0]
```
Available: `add_f32/2`, `add_f64/2`, `subtract_f32/2`, `subtract_f64/2`,
`multiply_f32/2`, `multiply_f64/2`, `divide_f32/2`, `divide_f64/2`.
### Dot Product & Sum
```erlang
{ok, Dot} = sied:dot_product_f32([1.0, 2.0, 3.0], [4.0, 5.0, 6.0]).
%% Dot = 32.0
{ok, Sum} = sied:sum_f32([1.0, 2.0, 3.0, 4.0, 5.0]).
%% Sum = 15.0
```
### Statistics
```erlang
{ok, Mean} = sied:mean_f32([1.0, 2.0, 3.0]).
{ok, Var} = sied:variance_f32([1.0, 2.0, 3.0]).
{ok, Std} = sied:std_dev_f32([1.0, 2.0, 3.0]).
```
Also available in f64 variants.
### Min / Max
```erlang
{ok, Min} = sied:min_f32([3.0, 1.0, 2.0]). %% 1.0
{ok, Max} = sied:max_f32([3.0, 1.0, 2.0]). %% 3.0
{ok, R} = sied:min_elementwise_f32([1.0, 4.0], [2.0, 3.0]). %% [1.0, 3.0]
```
### Unary Operations
```erlang
{ok, R} = sied:abs_f32([-1.0, 2.0, -3.0]). %% [1.0, 2.0, 3.0]
{ok, R} = sied:sqrt_f32([4.0, 9.0, 16.0]). %% [2.0, 3.0, 4.0]
{ok, R} = sied:negate_f32([1.0, -2.0]). %% [-1.0, 2.0]
```
### L2 Norm and Normalization
```erlang
{ok, Norm} = sied:l2_norm_f32([3.0, 4.0]). %% 5.0
{ok, Unit} = sied:l2_normalize_f32([3.0, 4.0]). %% [0.6, 0.8]
{ok, Vecs} = sied:l2_normalize_batch_f32([[3.0, 4.0], [0.0, 2.0]]).
```
### Cosine Similarity
```erlang
{ok, Sim} = sied:cosine_similarity_f32([1.0, 0.0], [0.0, 1.0]). %% 0.0
{ok, Sims} = sied:cosine_similarity_batch_f32(Query, [Vec1, Vec2, Vec3]).
```
### Batch Dot Product
One query against many vectors in a single NIF call.
```erlang
{ok, Scores} = sied:dot_product_batch_f32(Query, [Vec1, Vec2, Vec3]).
%% Binary variant — avoids float-list marshalling when vectors are stored
%% as little-endian f32 binaries (e.g. in ETS):
{ok, Scores} = sied:dot_product_batch_f32_bin(QBin, [Bin1, Bin2, Bin3]).
```
### Binary Quantization & Flat-Buffer ANN Search (v0.2.0)
These primitives are designed for two-phase approximate nearest-neighbour search on large indexes where the entire vector set is stored as a single concatenated binary (flat buffer) in ETS.
#### `to_binary_f32/1` and `to_binary_f32_bin/1`
1-bit quantize a vector: each dimension becomes 1 if above the mean, else 0. 128 dims → 16 bytes.
```erlang
{ok, BinVec} = sied:to_binary_f32([0.1, 0.9, 0.4, 0.8]).
%% BinVec = <<0b0101:4, ...>> (packed bits)
%% Zero-copy variant when the vector is already a little-endian f32 binary:
{ok, BinVec} = sied:to_binary_f32_bin(F32Binary).
```
#### `hamming_topk_flat/4`
SIMD POPCNT over a flat binary buffer. Returns the indices of the `TopK` closest vectors, sorted ascending by Hamming distance. O(N) + O(K log K).
```erlang
%% BvecFlat = all binary-quantized vectors concatenated
%% VecLen = byte size of one quantized vector
{ok, Indices} = sied:hamming_topk_flat(QBinVec, BvecFlat, VecLen, TopK).
%% Indices = [3, 17, 42, ...] (0-based, sorted by distance)
```
Uses a max-heap of size K — O(K) memory regardless of corpus size.
#### `dot_product_topk_flat/4`
SIMD dot-product scoring of a candidate set selected from a flat f32 buffer. Designed to follow `hamming_topk_flat/4` in the two-phase search pipeline.
```erlang
%% F32Flat = all f32 vectors concatenated
%% VecByteLen = dim * 4
%% Indices = output of hamming_topk_flat/4
{ok, Scored} = sied:dot_product_topk_flat(QF32Bin, F32Flat, VecByteLen, Indices).
%% Scored = [{Score, Idx}, ...] sorted by descending score
```
Uses zero-copy f32 reinterpretation — no heap allocation per candidate.
#### Two-phase ANN example
```erlang
%% Phase 1 — fast Hamming filter
CandCount = K * 10,
{ok, Cands} = sied:hamming_topk_flat(QBin, BvecFlat, VecByteLen, CandCount),
%% Phase 2 — precise dot-product rerank
{ok, Scored} = sied:dot_product_topk_flat(QF32Bin, F32Flat, VecF32ByteLen, Cands),
TopK = lists:sublist(Scored, K).
```
See [kvex](https://hex.pm/packages/kvex) for a complete k-NN index built on these primitives.
## Error Handling
Binary operations require vectors of equal length:
```erlang
case sied:add_f32([1.0, 2.0], [3.0]) of
{ok, Result} -> io:format("Success: ~p~n", [Result]);
{error, length_mismatch} -> io:format("vectors must have equal length~n")
end.
```
## Testing
```bash
rebar3 eunit
```
## Performance
The flat-buffer primitives (`hamming_topk_flat`, `dot_product_topk_flat`) are designed for high-throughput ANN search:
- `hamming_topk_flat` on 10 000 × 128-dim vectors: ~10 μs (release, AVX2)
- `dot_product_topk_flat` on 100 candidates × 128-dim: ~5 μs (release, AVX2)
- Combined two-phase search with [kvex](https://hex.pm/packages/kvex): **21 000+ queries/s** at 10 000 vectors, dim=128, K=10
General characteristics by vector size:
- **Small** (< 100 elements): NIF call overhead dominates — use pure Erlang for tiny vectors
- **Medium** (100–10 000 elements): Good speedup over pure Erlang
- **Large** (> 10 000 elements): Maximum SIMD benefit
## Project Structure
```
sied/
├── src/
│ ├── sied.app.src # OTP application metadata
│ └── sied.erl # Erlang API module
├── native/
│ └── sied/
│ ├── Cargo.toml # Rust dependencies
│ └── src/
│ └── lib.rs # Rust NIF implementation
├── test/
│ └── sied_tests.erl # EUnit test suite
├── rebar.config
└── README.md
```
## Links
- GitHub: [https://github.com/roquess/sied](https://github.com/roquess/sied)
- Hex.pm: [https://hex.pm/packages/sied](https://hex.pm/packages/sied)
- kvex (ANN index using sied): [https://hex.pm/packages/kvex](https://hex.pm/packages/kvex)
- Rustler: [https://crates.io/crates/rustler](https://crates.io/crates/rustler)
- Simdeez: [https://crates.io/crates/simdeez](https://crates.io/crates/simdeez)
## License
Apache 2.0 — see [LICENSE](LICENSE).