Skip to main content

README.md

# babble

[![Package Version](https://img.shields.io/hexpm/v/babble)](https://hex.pm/packages/babble)
[![Hex Docs](https://img.shields.io/badge/hex-docs-ffaff3)](https://hexdocs.pm/babble/)

A Markov chain text generator for Gleam.

```sh
gleam add babble
```

## Usage

```gleam
import gleam/io
import babble

pub fn main() {
  let model =
    babble.new(order: 2, tokenization: babble.Words)
    |> babble.train("the cat sat on the mat.")
    |> babble.train("the dog sat on the log.")

  let assert Ok(sentence) = babble.generate(model, babble.weighted, max_tokens: 200)
  io.println(sentence) // => the dog sat on the mat.
}
```

`train` is incremental, so you can keep one model and feed it text as it arrives.
`generate` returns `Error(EmptyModel)` until the model has learned something.

## Configuration

`new` takes two settings, fixed at construction:

- `order`: how many previous tokens to condition on. Higher is more coherent but
  repeats the source more; lower is more random. 2 is a reasonable default.
- `tokenization`: `Words` or `Characters`. With `Characters`, `order` counts characters.

The length cap is a `generate` argument (`max_tokens:`), not a model setting.

## Sampling

`generate` takes a sampler: the function that chooses the next token from the
weighted candidates at each step. Two are built in:

- `babble.weighted`: picks at random, weighted by training frequency. Varies each call.
- `babble.most_likely`: always picks the most frequent successor. Deterministic.

A sampler is `fn(List(#(Step, Int))) -> Step`, where `Step` is `Continue(word)` or
`Stop` and the `Int` is the training count. Write your own for temperature, top-k,
blocklists, and so on:

```gleam
import gleam/int
import gleam/list

fn uniform(candidates: List(#(babble.Step, Int))) -> babble.Step {
  case list.drop(candidates, int.random(list.length(candidates))) {
    [#(step, _), ..] -> step
    [] -> babble.Stop
  }
}
```

Samplers are stateless, so use `most_likely` for reproducible output rather than
seeding randomness yourself.

## Generation

```gleam
babble.generate(model, babble.weighted, max_tokens: 200) // one sentence
babble.generate_paragraph(model, 3, babble.weighted, max_tokens: 200) // three sentences
babble.generate_starting_with(model, "pizza", babble.weighted, max_tokens: 200) // from a prefix
```

A sentence ends at `.`, `!`, or `?` (learned during training) or when it hits
`max_tokens`.

## Development

```sh
gleam test
gleam format
```