README.md

# OmnivoiceEx

[![Hex.pm](https://img.shields.io/hexpm/v/omnivoice_ex.svg)](https://hex.pm/packages/omnivoice_ex)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

Elixir wrapper for [OmniVoice](https://huggingface.co/k2-fsa/OmniVoice) โ€” a unified speech generation model from K2-FSA.

**Voice Cloning** ยท **Voice Design** ยท **Multilingual TTS** ยท **24kHz Output**

## Features

- ๐ŸŽค **Voice Cloning** โ€” Clone any voice from a short reference audio clip
- ๐ŸŽจ **Voice Design** โ€” Describe a voice in natural language ("warm female broadcaster", "deep authoritative narrator")
- ๐ŸŒ **Multilingual** โ€” Supports multiple languages with automatic detection
- โšก **GPU Optimized** โ€” CUDA, Apple Silicon (MPS), or CPU fallback
- ๐Ÿ”Š **24kHz WAV** โ€” Professional-grade audio output
- ๐Ÿ“ฆ **MessagePack Protocol** โ€” Zero-base64 binary transport over Erlang Ports

## Requirements

- Elixir โ‰ฅ 1.14
- Python โ‰ฅ 3.10
- CUDA GPU (recommended), Apple Silicon MPS, or CPU
- `omnivoice` pip package (auto-installed via `mix omnivoice_ex.setup`)

## Installation

Add to your `mix.exs`:

```elixir
def deps do
  [
    {:omnivoice_ex, "~> 0.1.0"}
  ]
end
```

Then install Python dependencies:

```bash
mix omnivoice_ex.setup
```

## Quick Start

```elixir
# Start the model server
{:ok, pid} = OmnivoiceEx.start_link(device: "cuda")

# Wait for model to load
:ok = OmnivoiceEx.await_ready(pid)

# Generate speech
{:ok, audio} = OmnivoiceEx.generate(pid, "Hello, world!")

# Save to file
:ok = OmnivoiceEx.save(audio, "output.wav")

# Clean shutdown
OmnivoiceEx.stop(pid)
```

## Voice Design

Describe a voice in natural language and OmniVoice generates it:

```elixir
{:ok, audio} = OmnivoiceEx.generate(pid,
  "Welcome to our luxury resort.",
  instruct: "A warm, professional female concierge with a British accent"
)
```

## Voice Cloning

Clone a voice from a reference audio file:

```elixir
{:ok, audio} = OmnivoiceEx.generate(pid,
  "This is a cloned voice speaking English.",
  ref_audio: "/path/to/reference.wav",
  ref_text: "Transcript of the reference audio"  # optional, improves quality
)
```

## Generation Options

| Option | Type | Default | Description |
| ------ | ---- | ------- | ----------- |
| `ref_audio` | `String.t()` | โ€” | Path to reference audio for cloning |
| `ref_text` | `String.t()` | โ€” | Transcript of reference audio |
| `instruct` | `String.t()` | โ€” | Voice instruction for design |
| `language` | `String.t()` | โ€” | Language code (auto-detected) |
| `duration` | `float()` | โ€” | Target duration in seconds |
| `speed` | `float()` | โ€” | Playback speed factor |
| `num_step` | `pos_integer()` | `32` | Diffusion steps (more = higher quality) |
| `guidance_scale` | `float()` | `2.0` | CFG guidance scale |

## Architecture

```
Elixir (GenServer) โ†โ†’ Erlang Port โ†โ†’ Python Bridge โ†โ†’ OmniVoice Model
                    (stdin/stdout)   (msgpack framed)
```

Uses **MessagePack** binary framing over Erlang Ports โ€” audio is transmitted as raw WAV bytes inside msgpack, eliminating the 33% base64 overhead of JSON-based solutions.

## License

Apache 2.0 โ€” see [LICENSE](LICENSE).

## Related

- [OmniVoice on HuggingFace](https://huggingface.co/k2-fsa/OmniVoice)
- [VoxCPMEx](https://hex.pm/packages/voxcpmex) โ€” Elixir wrapper for VoxCPM2