README.md

# whisper_cpp

A thin Elixir wrapper around [`whisper-rs`](https://codeberg.org/tazz4843/whisper-rs),
the Rust bindings to [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
It exposes whisper.cpp speech-to-text to the BEAM through a Rustler NIF: load a
model, hand it 16 kHz mono f32 PCM, get structured segments back. No subprocess,
no Python, no temporary files.

## Installation

```elixir
def deps do
  [{:whisper_cpp, "~> 0.1.0"}]
end
```

Installation downloads a precompiled NIF for your target from the project's
GitHub releases - no Rust toolchain needed. Requires Elixir 1.19+.

## Usage

```elixir
{:ok, model} = WhisperCpp.load_model("models/ggml-large-v3.bin")

# Decode upstream (ffmpeg, bumblebee, ...) into 16 kHz mono f32 PCM:
#   ffmpeg -i jfk.wav -f f32le -ac 1 -ar 16000 jfk.pcm
pcm = File.read!("jfk.pcm")

{:ok, %WhisperCpp.Transcription{text: text, segments: segs}} =
  WhisperCpp.transcribe(model, {:pcm_f32, pcm}, language: "en")

IO.puts(text)
for s <- segs, do: IO.puts("[#{s.start}-#{s.end}] #{s.text}")
```

Audio is always `{:pcm_f32, binary}` - little-endian f32 samples, mono, 16 kHz,
normalised to `[-1.0, 1.0]`. The library does **not** decode WAV/MP3/etc;
decode upstream. `transcribe_slice/4` runs a `[start_s, end_s)` window of a
master PCM buffer and shifts the returned times back into the source timeline.

See [the docs](https://hexdocs.pm/whisper_cpp) for the full option list
(`:translate`, `:initial_prompt`, `:word_timestamps`, `:beam_size`,
`:n_threads`, cancellation, progress messages, ...) and error handling.

## Backends

CPU is always available. Pick one accelerator per build; the precompiled Hex
package ships CPU plus `cuda` / `hipblas` variants for Linux and Metal on Apple
Silicon, selected via `WHISPER_CPP_VARIANT`:

```bash
WHISPER_CPP_VARIANT=cuda mix deps.compile whisper_cpp
```

To build from source with any whisper-rs backend (`cuda`, `hipblas`, `vulkan`,
`metal`, `coreml`, `intel-sycl`, `openblas`, `openmp`):

```bash
WHISPER_CPP_BUILD=1 WHISPER_CPP_FEATURES=cuda mix compile
```

Source builds need a Rust toolchain, `cmake`, a C++17 compiler, and the
backend's own SDK (CUDA toolkit, ROCm, Vulkan SDK, ...).

## Testing

```bash
mix test                  # unit tests, no downloads
mix test --include integration  # downloads ggml-tiny.en + JFK sample, real inference
```

## License

MIT. whisper.cpp is MIT-licensed; `whisper-rs` is public domain (Unlicense)
and vendors whisper.cpp, linking it statically.