lib/tokenizers.ex
defmodule Tokenizers do
@moduledoc """
Elixir bindings to [Hugging Face Tokenizers](https://github.com/huggingface/tokenizers).
Hugging Face describes the Tokenizers library as:
> Fast State-of-the-art tokenizers, optimized for both research and production
>
> 🤗 Tokenizers provides an implementation of today’s most used tokenizers, with a focus on performance and versatility. These tokenizers are also used in 🤗 Transformers.
This library has bindings to use pretrained tokenizers. Support for building and training
a tokenizer from scratch is forthcoming.
A tokenizer is effectively a pipeline of transforms to take some input text and return a
`Tokenizers.Encoding.t()`. The main entrypoint to this library is the `Tokenizers.Tokenizer`
module, which holds the `Tokenizers.Tokenizer.t()` struct, a container holding the constituent
parts of the pipeline. Most functionality is there.
"""
end