README.md

# murmur_nif

Erlang NIF wrapper around [MurmurHash3][murmur] (x64_128) with a
Cassandra-compatible signed-byte variant for token-aware routing
against Cassandra and Scylla.

![Build Status](https://github.com/lpgauth/murmur_nif/workflows/Erlang%20CI/badge.svg)

[murmur]: https://github.com/aappleby/smhasher

## Why

Replaces git-ref dependencies on hand-rolled Murmur3 NIF forks.
Modern build toolchain (correct OTP 27+ `-eval` order, macOS
`-undefined dynamic_lookup`, dirty-scheduler dispatch and
`enif_consume_timeslice` accounting on the inline path), tested
against OTP 25-28 in CI, and published to hex.pm.

## Install

```erlang
{deps, [{murmur_nif, "0.1.0"}]}.
```

Requires a C compiler (`cc`) on the build host -- universally
available on systems that already run Erlang.

## API

```erlang
-spec murmur_nif:murmur3_x64_128(binary())           -> binary().
-spec murmur_nif:murmur3_cassandra_x64_128(binary()) -> binary().
```

Both functions return a fixed 16-byte binary representing the 128-bit
hash, using seed 0.

```erlang
1> murmur_nif:murmur3_x64_128(<<"hello">>).
<<2,155,189,65,179,167,216,203,25,29,174,72,106,144,30,91>>
```

### Which variant to use

- `murmur3_x64_128/1` -- Austin Appleby's standard MurmurHash3 x64_128.
  Use for general-purpose hashing.
- `murmur3_cassandra_x64_128/1` -- Cassandra/Scylla-compatible variant.
  The input bytes are interpreted as signed (matching Java's signed
  `byte` type), which changes the sign-extension of the tail-block
  accumulator and produces hashes that match Cassandra's partitioner.
  Use to compute partition tokens for token-aware routing.

For pure-ASCII inputs (all bytes < 128) the two variants produce
identical output. They only diverge when high bits are set.

## Behaviour notes

- **Dirty CPU scheduler** for inputs above 20 KB. In practice hash
  inputs are small (partition keys are typically tens to hundreds of
  bytes), but the threshold protects against scheduler hogs on large
  inputs.
- **Inline path reduction accounting** via `enif_consume_timeslice`,
  proportional to bytes processed. Cost model: ~500 bytes/reduction
  (calibrated for ~5 GB/s hash throughput), 4000-reduction timeslice.

## Build

`rebar3 compile` runs `c_src/build.sh`:

- Resolves `ERTS_INCLUDE_DIR` via
  `erl -noshell -eval ... -s init stop` (option order is correct for
  OTP 27+).
- Compiles `c_src/murmur_nif.c` + `c_src/murmur3/murmur3.c` with
  `-O3 -march=native`.
- Outputs `priv/murmur_nif.so`.

Env vars honored:

| Var | Effect |
|---|---|
| `ERTS_INCLUDE_DIR` | Skip the `erl` probe; use this path for `erl_nif.h`. |
| `CC` | Compiler (default `cc`). |
| `CFLAGS` | Extra flags appended after defaults. |
| `MURMUR_NIF_NO_NATIVE` | If set, omit `-march=native`/`-mtune=native` (use for portable cross-platform builds). |

## License

The Erlang wrapper code (`src/`, `c_src/murmur_nif.c`) is **MIT**.

The MurmurHash3 algorithm in `c_src/murmur3/` was written by Austin
Appleby and placed in the public domain. The Cassandra-compatible
variant uses signed integer arithmetic to match Java's reference
implementation; the algorithmic modification is trivial enough to
remain in the public domain alongside the upstream code.