# Paasaa
[](https://github.com/minibikini/paasaa/actions/workflows/elixir.yml)
[](https://coveralls.io/github/minibikini/paasaa?branch=master)
[](https://hex.pm/packages/paasaa)
[](https://hex.pm/packages/paasaa)
Paasaa is an Elixir library for robust natural language and script detection. It achieves this through statistical analysis of character n-grams and Unicode script properties, without relying on AI. It helps in tasks like text processing, natural language understanding, or internationalization by accurately identifying the writing system and human language of a given text.
[API Documentation](https://hexdocs.pm/paasaa/) | [Hex Package](https://hex.pm/packages/paasaa)
## Installation
Add `paasaa` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[{:paasaa, "~> 1.0.0"}]
end
```
After you are done, run `mix deps.get` in your shell to fetch and compile **Paasaa**.
## Usage
Detect a language:
```elixir
iex> Paasaa.detect("Detect this!")
"eng"
```
Detect language and return a scored list of languages:
```elixir
iex> Paasaa.all("Detect this!")
[
{"eng", 1.0},
{"sco", 0.8230731943771207},
{"nob", 0.6030053320407174},
{"nno", 0.5525933107125545},
...
]
```
Detect a script:
```elixir
iex> Paasaa.detect_script("Detect this!")
{"Latin", 0.8333333333333334}
```
### Advanced Usage with Options
The `detect/2` and `all/2` functions accept a keyword list of options to control their behavior.
**Whitelist and Blacklist Languages**
You can restrict the set of possible languages. This is useful if you already know the text must be one of a few languages, or you want to exclude a common false positive.
```elixir
# Exclude English to find the next most likely language
iex> Paasaa.detect("Detect this!", blacklist: ["eng"])
"sco"
# Only consider Polish and Serbian
iex> text = "Pošto je priznavanje urođenog dostojanstva i jednakih i neotuđivih prava..."
iex> Paasaa.detect(text, whitelist: ["pol", "srp"])
"srp"
```
**Set Minimum Text Length**
By default, Paasaa returns `"und"` for very short strings. You can adjust this threshold with `:min_length`.
```elixir
iex> Paasaa.detect("Привет", min_length: 10)
"und"
iex> Paasaa.detect("Привет", min_length: 6)
"rus"
```
## Supported Languages
For a full list of supported languages, please see [LANGUAGES.md](LANGUAGES.md).
## Contributing
Contributions are welcome! Please feel free to open an issue or submit a pull request on [GitHub](https://github.com/minibikini/paasaa).
If you are updating the language data, you can regenerate the necessary modules with the following command:
```shell
mix run script/generate_language_data.exs
```
## Derivation
**Paasaa** is a derivative work from [Franc](https://github.com/wooorm/franc/) (JavaScript, MIT) by Titus Wormer.
## License
MIT © Egor Kislitsyn