# Usage Guide
`FastestTiktoken` exposes one public module: `FastestTiktoken`.
Every tokenizer operation takes an explicit selector. Use `:model` when you want
OpenAI-compatible model resolution, or `:encoding` when you already know the
encoding family.
## Select by Model
```elixir
FastestTiktoken.count_tokens("hello world", model: "gpt-4o")
#=> {:ok, 2}
FastestTiktoken.encode("hello world", model: "gpt-4o")
#=> {:ok, [24912, 2375]}
```
Model names follow the official OpenAI `tiktoken` mapping for supported
encodings, including versioned model prefixes:
```elixir
FastestTiktoken.encoding_for_model("gpt-4o-2024-05-13")
#=> {:ok, "o200k_base"}
FastestTiktoken.encoding_for_model("gpt-3.5-turbo-0301")
#=> {:ok, "cl100k_base"}
FastestTiktoken.encoding_for_model("text-davinci-003")
#=> {:ok, "p50k_base"}
FastestTiktoken.encoding_for_model("gpt-oss-120b")
#=> {:ok, "o200k_harmony"}
```
## Select by Encoding
```elixir
FastestTiktoken.encode("hello world", encoding: :cl100k_base)
#=> {:ok, [15339, 1917]}
FastestTiktoken.decode([15339, 1917], encoding: :cl100k_base)
#=> {:ok, "hello world"}
FastestTiktoken.encode("<|start|>hello<|end|>",
encoding: :o200k_harmony,
allowed_special: :all
)
#=> {:ok, [200006, 24912, 200007]}
```
`FastestTiktoken.list_encodings/0` returns available encoding names.
```elixir
FastestTiktoken.list_encodings()
#=> {:ok, ["cl100k_base", "deepseek_v3", "gpt2", ..., "o200k_harmony"]}
```
`gpt2` is provided as an OpenAI-compatible alias for the equivalent `r50k_base`
encoding in the Rust crate.
## Count Tokens
Use `count_tokens/2` when you only need a size estimate for model limits,
chunking, routing, or billing logic.
```elixir
FastestTiktoken.count_tokens("表情符号是\n🦜🔗", model: "gpt-4o")
#=> {:ok, 11}
```
With default options, counting uses the native crate's zero-allocation count
path instead of building a token list and calling `length/1`.
## Encode and Decode
```elixir
{:ok, tokens} =
FastestTiktoken.encode("请考试我的软件!12345", encoding: :cl100k_base)
FastestTiktoken.decode(tokens, encoding: :cl100k_base)
#=> {:ok, "请考试我的软件!12345"}
```
Invalid token IDs return tagged errors:
```elixir
FastestTiktoken.decode([-1], encoding: :cl100k_base)
#=> {:error, :invalid_token_ids}
```
## Ordinary Encoding
`encode_ordinary/2` treats special token strings as normal text.
```elixir
FastestTiktoken.encode_ordinary("hello <|endoftext|>", encoding: :cl100k_base)
#=> {:ok, [15339, 83739, 8862, 728, 428, 91, 29]}
```
This is the default behavior of `encode/2` unless `allowed_special` is set.
## Special Tokens
Special token handling is explicit.
Use `allowed_special: :all` to recognize every special token for the selected
encoding:
```elixir
FastestTiktoken.encode("hello <|endoftext|>",
encoding: :cl100k_base,
allowed_special: :all
)
#=> {:ok, [15339, 220, 100257]}
```
Use a list to allow only specific special tokens:
```elixir
FastestTiktoken.encode(
"<|endoftext|> hello <|fim_prefix|>",
encoding: :cl100k_base,
allowed_special: ["<|fim_prefix|>"]
)
#=> {:ok, [27, 91, 8862, 728, 428, 91, 29, 24748, 220, 100258]}
```
## Split Token Pieces
`split_tokens/2` returns the decoded string piece for each token.
```elixir
FastestTiktoken.split_tokens("hello world", model: "gpt-4o")
#=> {:ok, ["hello", " world"]}
```
Some valid token IDs are not valid UTF-8 when decoded one at a time. In that
case `split_tokens/2` returns `{:error, {:decode_failed, reason}}`.
## Batch Helpers
Batch helpers mirror the common official `tiktoken` public surfaces.
```elixir
{:ok, encoded} =
FastestTiktoken.encode_batch(["hello world", "goodbye world"],
encoding: :cl100k_base
)
FastestTiktoken.decode_batch(encoded, encoding: :cl100k_base)
#=> {:ok, ["hello world", "goodbye world"]}
```
For ordinary encoding:
```elixir
FastestTiktoken.encode_ordinary_batch(
["hello <|endoftext|>", "goodbye <|fim_prefix|>"],
encoding: :cl100k_base
)
```
## Error Handling
Runtime operations return `{:ok, value}` or `{:error, reason}`.
```elixir
FastestTiktoken.encode("hello", [])
#=> {:error, :missing_selector}
FastestTiktoken.encode("hello", model: "gpt-4o", encoding: :o200k_base)
#=> {:error, :ambiguous_selector}
FastestTiktoken.encode("hello", encoding: "missing")
#=> {:error, {:unsupported_encoding, "missing"}}
```
This makes the library convenient to use in pipelines, GenServers, background
jobs, and request handlers without rescuing exceptions for ordinary validation
failures.