# murmur_nif
Erlang NIF wrapper around [MurmurHash3][murmur] (x64_128) with a
Cassandra-compatible signed-byte variant for token-aware routing
against Cassandra and Scylla.

[murmur]: https://github.com/aappleby/smhasher
## Why
Replaces git-ref dependencies on hand-rolled Murmur3 NIF forks.
Modern build toolchain (correct OTP 27+ `-eval` order, macOS
`-undefined dynamic_lookup`, dirty-scheduler dispatch and
`enif_consume_timeslice` accounting on the inline path), tested
against OTP 25-28 in CI, and published to hex.pm.
## Install
```erlang
{deps, [{murmur_nif, "0.1.0"}]}.
```
Requires a C compiler (`cc`) on the build host -- universally
available on systems that already run Erlang.
## API
```erlang
-spec murmur_nif:murmur3_x64_128(binary()) -> binary().
-spec murmur_nif:murmur3_cassandra_x64_128(binary()) -> binary().
```
Both functions return a fixed 16-byte binary representing the 128-bit
hash, using seed 0.
```erlang
1> murmur_nif:murmur3_x64_128(<<"hello">>).
<<2,155,189,65,179,167,216,203,25,29,174,72,106,144,30,91>>
```
### Which variant to use
- `murmur3_x64_128/1` -- Austin Appleby's standard MurmurHash3 x64_128.
Use for general-purpose hashing.
- `murmur3_cassandra_x64_128/1` -- Cassandra/Scylla-compatible variant.
The input bytes are interpreted as signed (matching Java's signed
`byte` type), which changes the sign-extension of the tail-block
accumulator and produces hashes that match Cassandra's partitioner.
Use to compute partition tokens for token-aware routing.
For pure-ASCII inputs (all bytes < 128) the two variants produce
identical output. They only diverge when high bits are set.
## Behaviour notes
- **Dirty CPU scheduler** for inputs above 20 KB. In practice hash
inputs are small (partition keys are typically tens to hundreds of
bytes), but the threshold protects against scheduler hogs on large
inputs.
- **Inline path reduction accounting** via `enif_consume_timeslice`,
proportional to bytes processed. Cost model: ~500 bytes/reduction
(calibrated for ~5 GB/s hash throughput), 4000-reduction timeslice.
## Build
`rebar3 compile` runs `c_src/build.sh`:
- Resolves `ERTS_INCLUDE_DIR` via
`erl -noshell -eval ... -s init stop` (option order is correct for
OTP 27+).
- Compiles `c_src/murmur_nif.c` + `c_src/murmur3/murmur3.c` with
`-O3 -march=native`.
- Outputs `priv/murmur_nif.so`.
Env vars honored:
| Var | Effect |
|---|---|
| `ERTS_INCLUDE_DIR` | Skip the `erl` probe; use this path for `erl_nif.h`. |
| `CC` | Compiler (default `cc`). |
| `CFLAGS` | Extra flags appended after defaults. |
| `MURMUR_NIF_NO_NATIVE` | If set, omit `-march=native`/`-mtune=native` (use for portable cross-platform builds). |
## License
The Erlang wrapper code (`src/`, `c_src/murmur_nif.c`) is **MIT**.
The MurmurHash3 algorithm in `c_src/murmur3/` was written by Austin
Appleby and placed in the public domain. The Cassandra-compatible
variant uses signed integer arithmetic to match Java's reference
implementation; the algorithmic modification is trivial enough to
remain in the public domain alongside the upstream code.