# whisper_cpp usage rules
For agents and humans writing code against `whisper_cpp`. These rules are
shipped with the Hex package so downstream consumers can opt in to a
consistent set of conventions.
## Loading models
- Pass a path to a `.bin` or `.gguf` whisper.cpp checkpoint to
`WhisperCpp.load_model/2`. Download checkpoints from
<https://huggingface.co/ggerganov/whisper.cpp>.
- Cache the `%WhisperCpp.Model{}` for the process lifetime; loading is
expensive and the underlying NIF resource is safe to share across
BEAM processes - concurrent `transcribe/3` calls do not serialise.
- Prefer `device: :auto` (the default). Explicit device selection that
does not match the installed NIF artefact returns `:invalid_request`.
## Audio input
- `transcribe/3` accepts exactly one shape: `{:pcm_f32, binary()}`,
where the binary is little-endian IEEE-754 `f32` samples, mono,
16 kHz, normalised to `[-1.0, 1.0]`.
- This library does **not** decode audio file formats. Decode WAV,
MP3, FLAC, M4A, Opus, etc. upstream and hand the PCM in. Standard
recipe with ffmpeg:
```bash
ffmpeg -i input.mp3 -f f32le -ac 1 -ar 16000 input.pcm
```
In Elixir: `pcm = File.read!("input.pcm")`, then
`WhisperCpp.transcribe(model, {:pcm_f32, pcm}, ...)`.
- Bare binaries (without the `{:pcm_f32, _}` wrapper) and file paths
are rejected with `:invalid_request`. A typo'd path used to turn
into garbage PCM; the wrapper surfaces the bug instead.
## Slicing PCM
- Use `WhisperCpp.transcribe_slice/4` to transcribe a `[start_s, end_s)`
window of an already-decoded master PCM buffer. It handles the byte
math, runs whisper.cpp on the slice, and shifts segment/word times
back into the absolute timeline.
- Slices shorter than 0.3 s return an empty transcription. whisper.cpp
pads short inputs and hallucinates into the padding; do not pass
unfiltered VAD output.
## Cancellation and progress
- For cancellable transcribes, mint a `%WhisperCpp.AbortHandle{}` via
`WhisperCpp.AbortHandle.new/0` and pass it via `:abort_handle`.
Signal cancellation from another process with
`WhisperCpp.AbortHandle.abort/1`. The call returns
`{:ok, partial_transcription}` with whatever segments completed
before whisper.cpp's next abort poll.
- For progress, pass `:progress_pid` (commonly `self()` inside a
`Task`). The pid receives `{:whisper_progress, percent}` messages
(0..100) as work advances; duplicate percentages are coalesced.
- Both hooks are zero-cost when omitted.
## Options and errors
- Pass options as keyword lists. Unknown keys and out-of-range values
fail with `{:error, %WhisperCpp.Error{reason: :invalid_request}}`
before reaching the NIF - rely on this for input validation.
- Match `%WhisperCpp.Error{}` (or its `:reason` field) rather than
inspecting message strings.
## Performance
- `:n_threads` defaults to 4. On dedicated nodes, set it to the number
of physical cores.
- Word timestamps add one DTW pass; enable `:word_timestamps` only when
you need them.
- For latency-sensitive workloads, prefer `:single_segment` on short
clips to skip the segment-split pass.
- Beam search (`:beam_size > 1`) is roughly 2-3x slower than greedy and
worth it for the lowest WER on long-form audio; for short slices,
greedy is usually fine.
- A single loaded model handle is safe to share: parallel transcribe
calls do not serialise on the context lock, so saturating a GPU or
multi-core CPU from many BEAM processes is the expected pattern.