# motif
Pure Erlang keyword and topic extraction using the
[RAKE](https://www.researchgate.net/publication/227988510) algorithm.
Supports French, English, and German with built-in stop-word lists.
No external dependencies.
## Installation
```erlang
%% rebar.config
{deps, [{motif, "0.1.0"}]}.
```
## Quick start
```erlang
%% Extract from English text (language auto-detected)
Results = motif:extract(<<"Red roses are a symbol of love and beauty.">>),
%% [{<<"red roses">>, 4.0}, {<<"symbol">>, 1.0}, {<<"love">>, 1.0}, {<<"beauty">>, 1.0}]
%% Explicit language + max results
Top3 = motif:extract(Text, #{lang => fr, max => 3}),
%% Auto-detect language (samples first 200 words)
Auto = motif:extract(Text, #{lang => auto}),
%% Get the stop-word list for a language
Stops = motif:stop_words(fr).
```
## API
```erlang
%% Extract keyword candidates. Returns [{Keyword, Score}] sorted by score desc.
-spec extract(binary()) -> [{binary(), float()}].
-spec extract(binary(), #{max => pos_integer(),
lang => fr | en | de | auto}) -> [{binary(), float()}].
%% Return the built-in stop-word list for a language.
-spec stop_words(fr | en | de) -> [binary()].
```
## Algorithm
RAKE (Rapid Automatic Keyword Extraction):
1. Split text into sentences on `. ! ?`
2. Within each sentence, split into candidate phrases on stop words
3. Score each word: `degree(word) / frequency(word)`
where `degree(w)` = sum of phrase lengths containing `w`
4. Score each candidate: sum of its word scores
5. Return sorted by score descending, deduplicated
Multi-word phrases with co-occurring rare words score highest.
## Language detection
`lang => auto` samples the first 200 words, counts stop-word hits per
language, and picks the language with the most hits. Falls back to `en`
on a tie or empty input.
## License
Apache 2.0 — see [LICENSE](LICENSE).