README.md

<p align="center">
  <img src="chatterbex_logo_text.png" alt="Chatterbex Logo" width="400">
</p>

<p align="center">
  <a href="https://github.com/holsee/chatterbex/actions/workflows/ci.yml"><img src="https://github.com/holsee/chatterbex/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://hex.pm/packages/chatterbex"><img src="https://img.shields.io/hexpm/v/chatterbex.svg" alt="Hex.pm"></a>
  <a href="https://hexdocs.pm/chatterbex"><img src="https://img.shields.io/badge/hex-docs-blue.svg" alt="Documentation"></a>
  <a href="https://github.com/holsee/chatterbex/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
</p>

# Chatterbex

Elixir wrapper for [Chatterbox TTS](https://github.com/resemble-ai/chatterbox) - state-of-the-art text-to-speech models from Resemble AI.

## Features

- Zero-shot voice cloning
- Three model variants:
  - **Turbo** (350M params) - Low-latency English TTS with paralinguistic tags
  - **English** (500M params) - High-quality English with CFG controls
  - **Multilingual** (500M params) - 23+ languages support
- Simple Elixir API via GenServer

## Requirements

- Elixir 1.14+
- Python 3.10 or 3.11 (3.12+ not supported)
- CUDA GPU, Apple Silicon (MPS), or CPU
- `chatterbox-tts` Python package

## Installation

1. Add `chatterbex` to your dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:chatterbex, "~> 0.1.0"}
  ]
end
```

2. Install the Python dependencies:

```bash
mix chatterbex.setup
```

Or manually:

```bash
pip install chatterbox-tts
```

### Setup Options

```bash
# CUDA (NVIDIA GPU)
mix chatterbex.setup --cuda

# Apple Silicon (M1/M2/M3/M4)
mix chatterbex.setup --mps

# CPU-only (no GPU required, smaller download)
mix chatterbex.setup --cpu

# Use a virtual environment
mix chatterbex.setup --mps --venv .venv
```

## Usage

### Basic Text-to-Speech

```elixir
# Start a Turbo model server
{:ok, pid} = Chatterbex.start_link(model: :turbo)

# Generate speech
{:ok, audio} = Chatterbex.generate(pid, "Hello, world!")

# Save to file
:ok = Chatterbex.save(audio, "output.wav")
```

### Voice Cloning

Provide a reference audio file (10 seconds recommended) for voice cloning:

```elixir
{:ok, audio} = Chatterbex.generate(pid, "Hello in your voice!",
  audio_prompt: "path/to/reference.wav"
)
```

### Paralinguistic Tags (Turbo Model)

The Turbo model supports embedded emotional expressions:

```elixir
{:ok, audio} = Chatterbex.generate(pid, "That's hilarious [laugh] I can't believe it!")
```

Available tags: `[laugh]`, `[chuckle]`, `[cough]`, `[sigh]`, `[gasp]`, `[groan]`, `[yawn]`, `[sniff]`, `[clearing_throat]`

### Multilingual Support

```elixir
{:ok, pid} = Chatterbex.start_link(model: :multilingual)

# French
{:ok, audio} = Chatterbex.generate(pid, "Bonjour, comment allez-vous?", language: "fr")

# German
{:ok, audio} = Chatterbex.generate(pid, "Guten Tag, wie geht es Ihnen?", language: "de")

# Chinese
{:ok, audio} = Chatterbex.generate(pid, "你好，今天天气真不错", language: "zh")
```

### Model Options

```elixir
# Use Apple Silicon GPU (MPS)
{:ok, pid} = Chatterbex.start_link(model: :turbo, device: "mps")

# Use CPU instead of GPU
{:ok, pid} = Chatterbex.start_link(model: :english, device: "cpu")

# English model with exaggeration control
{:ok, audio} = Chatterbex.generate(pid, "This is exciting!",
  exaggeration: 0.7,
  cfg_weight: 0.5
)
```

### Named Servers

```elixir
# Start with a name for easy access
{:ok, _pid} = Chatterbex.start_link(model: :turbo, name: MyApp.TTS)

# Use anywhere in your app
{:ok, audio} = Chatterbex.generate(MyApp.TTS, "Hello!")
```

## Configuration

| Option | Description | Default |
|--------|-------------|---------|
| `:model` | Model variant (`:turbo`, `:english`, `:multilingual`) | `:turbo` |
| `:device` | Compute device (`"cuda"`, `"mps"`, `"cpu"`) | `"cuda"` |
| `:name` | GenServer name | `nil` |

## Generation Options

| Option | Model | Description |
|--------|-------|-------------|
| `:audio_prompt` | All | Path to reference audio for voice cloning |
| `:language` | Multilingual | Language code (e.g., "fr", "de", "zh") |
| `:exaggeration` | English | Expression intensity (0.0 - 1.0) |
| `:cfg_weight` | English | Classifier-free guidance weight |

## Supported Languages

The multilingual model supports: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Slovenian, Serbian, Macedonian, Albanian, Turkish, Arabic, Hebrew, Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay.

## Architecture

Chatterbex uses Erlang ports to communicate with a Python process running the Chatterbox models. Each `Chatterbex.start_link/1` call spawns a dedicated Python process with the loaded model, allowing multiple models or instances to run concurrently.

```text
+---------------+      JSON/stdin       +-----------------+
|    Elixir     | --------------------> |     Python      |
|   GenServer   |                       | Chatterbox TTS  |
|               | <-------------------- |                 |
+---------------+    Base64 WAV/stdout  +-----------------+
```

## Examples

See the [examples](examples/README.md) directory for runnable scripts:

- **[hello_world.exs](examples/hello_world.exs)** - Basic text-to-speech
- **[voice_cloning.exs](examples/voice_cloning.exs)** - Clone a voice from reference audio
- **[multilingual.exs](examples/multilingual.exs)** - Generate speech in 23+ languages

```bash
mix run examples/hello_world.exs --text "Hello" --device mps
mix run examples/voice_cloning.exs --reference voice.wav
mix run examples/multilingual.exs --text "Bonjour" --language fr
```

## Documentation

- [Examples README](examples/README.md) - Detailed usage for all examples
- [Architecture Decision Records](docs/adr/README.md) - Design decisions and rationale

## License

MIT License - See [LICENSE](LICENSE) for details.

## Acknowledgments

- [Resemble AI](https://www.resemble.ai/) for the Chatterbox TTS models
- The Elixir community