-module(erli18n_plural).
-moduledoc """
Evaluator and validator for the gettext/CLDR plural rules used by
erli18n.
Compiles the C expression from a `.po` `Plural-Forms:` header
(`nplurals=N; plural=EXPR;`) into a small AST and evaluates it to choose
the plural form for a given N — this is what backs `ngettext`/`npgettext`.
## The problem it solves
gettext selects the correct plural translation by evaluating a C
EXPRESSION embedded in the `.po` header (e.g. the Russian 3-form rule
`n%10==1 && n%100!=11 ? 0 : ...`). Each locale ships its own. This module
replaces the legacy `gettexter`'s Yecc/Leex/`erl_eval` pipeline with a
hand-written recursive-descent parser + AST interpreter (no dynamic
generation of Erlang code, so dialyzer/eqwalizer can reason about
everything). It turns `EXPR` into a `t:ast/0` and evaluates it to a form
index in `[0, NPlurals)`.
## Mental model
- **Two phases.** `compile/1` (load-time, cold) parses + validates +
packs into a `t:plural_compiled/0`. `evaluate/2` (lookup-time, HOT
PATH) interprets that bundle per call. The catalog loader compiles
ONCE and keeps the bundle; each `ngettext`/`npgettext` calls only
`evaluate/2`.
- **Runtime source-of-truth is the `.po` header** (PSD-004). The embedded
CLDR table (`cldr_rule/1`, one row per GNU gettext / CLDR locale) does NOT take part in the hot
path: it is consulted only at load time to emit divergence warnings
(`validate_against_cldr/2`) and as a fallback when the header is missing
(`fallback_rule/0`).
- **Trusted vs untrusted.** The header expression comes from a tenant's
`.po` — UNTRUSTED input (ADR-0003, see `SECURITY.md`). The
`cldr_data/0` table is a static module literal — TRUSTED. That is why
`compile/1` is fail-closed and hardened, while `cldr_compiled_table/0`
assumes every row compiles.
- **Pure function, no per-process state.** Unlike the catalog server, this
module has no gen_server, no ETS and no process dictionary. The only
side effect is a global read-once cache in `persistent_term`
(`cldr_compiled_table/0`), memoising the compiled CLDR ASTs — a
module-scoped singleton under a fixed key, built once per node and never
invalidated (`cldr_data/0` is constant).
## Anti-DoS hardening (ADR-0003)
The attack surface is the `.po` expression. The defenses ALL live in
`compile/1` (cold), so that `evaluate/2` (hot) stays O(1)-bounded by
construction:
- `?PLURAL_EXPR_MAX_BYTES` (2048) — rejects a long expression before parse.
- `?PLURAL_EXPR_MAX_DEPTH` (64) — bounds nesting (and the walker's stack).
- `?AST_MAX_NODES` (256) — bounds the node count (a wide flat chain
`n*n*...*n` passes both caps above but would grow an `n^k` bignum per
lookup).
- `?MAX_INT_DIGITS` (7) — bounds the digits of `nplurals=` before
`binary_to_integer` materialises the bignum.
- Static rejection (`validate_safe/2`) — refuses rules provably faulty
for EVERY N (div/mod by a constant divisor of 0; constant outside
`[0, NPlurals)`).
`evaluate/2` is TOTAL: it never raises. Mirroring the GNU libintl runtime
(`dcigettext.c`), division/modulo by zero is coerced to 0 and a form
outside `[0, NPlurals)` is clamped to 0. Anyone who needs to OBSERVE the
anomaly (log/alert) uses `evaluate_checked/2`, which returns it as data.
## When you touch this module
- **Consumer:** almost never directly — you call `erli18n:ngettext/5` and
the catalog server takes care of `compile/1`/`evaluate/2`. For a quick
test outside the server, `plural_by_po_header/2` compiles and evaluates
in one step.
- **Loader maintainer:** calls `compile/1` at load, keeps the bundle, and
on the hot path calls `evaluate/2`. For CLDR divergence at load use
`validate_against_cldr_ast/2` (reuses the already-compiled AST).
- **CLDR table maintainer:** edits `cldr_data/0` when syncing a CLDR
release.
## Quickstart
```erlang
%% Compile the Russian 3-form (one/few/many) rule once...
1> Hdr = <<"nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : "
1> "n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2;">>.
2> {ok, C} = erli18n_plural:compile(Hdr).
{ok,#{raw => Hdr,expr => {ternary,_,_,_},nplurals => 3}}
%% ...and select the form for various N (hot path).
3> erli18n_plural:evaluate(C, 1).
0
4> erli18n_plural:evaluate(C, 2).
1
5> erli18n_plural:evaluate(C, 5).
2
%% One-off use: compile and evaluate at once.
6> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok, 0}
```
## Key functions
- `compile/1` — parse + fail-closed validation → `t:plural_compiled/0`.
- `evaluate/2` — hot path, total, returns the form index.
- `evaluate_checked/2` — structured sibling that reports anomalies as data.
- `plural_by_po_header/2` — compile+evaluate shortcut for one-off use.
- `cldr_rule/1` / `validate_against_cldr/2` / `validate_against_cldr_ast/2`
— CLDR observability (off the hot path).
- `fallback_rule/0` — Germanic default when the header is missing.
""".
%% Evaluator for the GNU gettext `Plural-Forms:` header C-expression
%% (per https://www.gnu.org/software/gettext/manual/gettext.html#Translating-plural-forms)
%% and CLDR-canonical-rule validator
%% (per https://cldr.unicode.org/index/cldr-spec/plural-rules,
%% source data: cldr-json/cldr-core/supplemental/plurals.json in
%% https://github.com/unicode-org/cldr-json).
%%
%% Design source-of-truth:
%%
%% * PSD-004 (po_semantics_decisions.md) — the `.po` `Plural-Forms` header
%% is the runtime source-of-truth; CLDR is consulted only at load-time
%% for divergence warnings, and as fallback when the header is absent.
%% Therefore `evaluate/2` is the **hot path** and must never touch CLDR.
%%
%% * PSD-008 (po_semantics_decisions.md) — degenerate plural rules
%% (`nplurals=1; plural=0;`, used by ja/zh/ko/vi/th) must round-trip
%% through compile/evaluate as a literal integer expression. The
%% grammar therefore accepts integer literals as valid primary terms.
%%
%% * BR-DESCARTAR-003 (discard_log.md) — the GNU Plural-Forms evaluation
%% capability is preserved from the legacy `gettexter_plural` module.
%% The Yecc/Leex/erl_syntax/erl_eval pipeline (~231 LOC) is dropped,
%% but the C-truthy operator semantics and recursive walker shape are
%% refactored here into a single recursive-descent parser + interpreter.
%%
%% * paradigm_decision.md §E3 — hybrid wrapper: local recursive-descent
%% evaluator in the hot path; CLDR table only out of the hot path.
%%
%% Implementation notes:
%%
%% * No Yecc, no Leex, no dynamic Erlang code generation — the evaluator
%% interprets a small AST so dialyzer can reason about everything.
%% * Operators follow C precedence/associativity. Short-circuit semantics
%% are honoured for `&&` and `||` so that expressions guarded against
%% division by zero (e.g. `n != 0 && (10/n) > 1`) behave as in C.
%% * Modulo (`%`) uses Erlang `rem`, which matches C99 truncation toward
%% zero — the only behaviour `.po` plural rules ever rely on.
%% * Division by zero in untrusted `.po` input is handled, not
%% propagated (finding #1, plural-eval-throws-per-lookup-dos):
%% `evaluate/2` is TOTAL on the per-request hot path. A zero divisor
%% is pinned to 0 (`eval_div/2` / `eval_rem/2`) and an out-of-range
%% form is clamped to 0, matching GNU libintl's `dcigettext.c`
%% instead of raising `badarith`. Statically-faulty rules are
%% rejected up front by `compile/1`; `evaluate_checked/2` surfaces
%% the anomaly as data for callers that want to observe it.
%% * CLDR data ships inline (one row per GNU gettext / CLDR locale, see `cldr_rule/1`), but the
%% `cldr_data/0` rows are no longer maintained by hand: they are
%% generated between the `BEGIN/END GENERATED CLDR TABLE` markers by
%% `bin/gen-plural-table.escript` from the committed seed table
%% `apps/erli18n/priv/gettext/plural_forms.eterm`. The inline literal
%% is still what the runtime reads (small, dependency-free, reviewable
%% data surface); the seed + generator give a single source of truth
%% and a diffable target for `bin/extract-gettext-table.sh`, which
%% produces the same `{Locale, NPlurals, PluralExpr}` shape from the
%% real GNU gettext toolchain so drift can be detected on a CLDR
%% release sync. The alternatives stay rejected:
%%
%% - Option B: external hex dep (e.g. ex_cldr). Heavyweight, pulls
%% Elixir interop, not justified for a single-table lookup.
%% - Option C: parsing upstream CLDR JSON at build time. The seed
%% eterm keeps the shipped data surface small and reviewable
%% without a JSON toolchain in the build.
%% Public API.
-export([
compile/1,
evaluate/2,
evaluate_checked/2,
plural_by_po_header/2,
cldr_rule/1,
validate_against_cldr/2,
validate_against_cldr_ast/2,
fallback_rule/0
]).
-export_type([
plural_compiled/0,
compile_error/0,
plural_eval_error/0,
ast/0,
op/0
]).
%% =========================
%% Types
%% =========================
-doc """
Compiled plural-rule bundle — the output of `compile/1` and the input to
`evaluate/2`/`evaluate_checked/2`.
- `nplurals` — how many plural forms the locale has (validated in
`[1, ?NPLURALS_MAX]`); every returned index stays in `[0, nplurals)`.
- `expr` — the parsed `t:ast/0` of the `plural=` expression, evaluated on
the hot path.
- `raw` — the originating raw header, preserved for diagnostics and for
the divergence payload of `validate_against_cldr_ast/2`.
Compile once at load and reuse this map on every lookup; there is no
result cache inside `evaluate/2`.
""".
-type plural_compiled() :: #{
nplurals := pos_integer(),
expr := ast(),
raw := binary()
}.
-doc """
Structural failure reason from `compile/1` — always fail-closed, never an
exception.
Groups header defects (`missing_nplurals`, `missing_plural_expr`,
`nplurals_out_of_range`, `syntax_error`) and the anti-DoS hardening
rejections: `expr_too_long`/`expr_too_deep`/`expr_too_complex`
(byte/depth/node caps), `nplurals_too_many_digits` (digit cap before the
bignum) and `unsafe_plural_rule` (rule statically faulty for every N).
See `compile/1` for what triggers each one.
""".
-type compile_error() ::
{syntax_error, Reason :: term(), Position :: non_neg_integer()}
| {missing_nplurals, binary()}
| {missing_plural_expr, binary()}
| {nplurals_out_of_range, integer()}
%% Layer 3 (finding #1): a rule that is STATICALLY guaranteed to
%% fault — a literal division/modulo by zero, or a constant form
%% index provably outside [0, NPlurals) — is rejected at load time
%% so the poisoned catalog is refused by `ensure_loaded` rather than
%% loading as `{ok, _}` and crashing every later lookup.
| {unsafe_plural_rule, plural_eval_error()}
%% Finding #2 (plural-compile-superlinear-unbounded): the parser
%% runs on untrusted `.po` input inside the catalog gen_server's
%% `handle_call`. An expression longer than `?PLURAL_EXPR_MAX_BYTES`
%% or nested deeper than `?PLURAL_EXPR_MAX_DEPTH` is rejected
%% fail-closed so a pathological-but-valid rule cannot make compile
%% superlinear/unbounded and freeze the server.
| {expr_too_long, Size :: non_neg_integer(), Max :: pos_integer()}
| {expr_too_deep, Depth :: pos_integer(), Position :: non_neg_integer()}
%% Finding #9 (plural-bignum-cpu-dos-evaluate-hotpath): the byte and
%% depth caps above do not bound the AST NODE COUNT, so a wide flat
%% operator chain (`n*n*...*n`) can still compile to thousands of
%% nodes that `evaluate/2` walks — growing an `n^k` bignum — on every
%% lookup. An AST above `?AST_MAX_NODES` is rejected fail-closed so
%% the per-lookup cost stays O(1)-bounded by construction.
| {expr_too_complex, Nodes :: pos_integer(), Max :: pos_integer()}
%% Finding #8 (po-plural-unbounded-binary-to-integer-bignum): the
%% `nplurals=<digits>` run is capped by DIGIT COUNT before any
%% `binary_to_integer` materialises the bignum. The rejected value is
%% deliberately kept OUT of the payload (only the digit count and the
%% cap are reported) so a thousands-digit adversarial run cannot
%% amplify memory/logs, and the >=~1.3M-digit `system_limit` path is
%% never reached.
| {nplurals_too_many_digits, Digits :: pos_integer(), Max :: pos_integer()}.
-doc """
Anomaly observed while evaluating a compiled rule — returned as data,
never raised.
`{division_by_zero, '/' | '%'}` when an evaluated divisor is 0;
`{form_out_of_range, Form, NPlurals}` when the index falls outside
`[0, NPlurals)`. It appears as a return of `evaluate_checked/2` and as the
payload of an `{unsafe_plural_rule, _}` rejected by `compile/1`. The total
`evaluate/2` NEVER produces this — it clamps (parity with libintl).
""".
-type plural_eval_error() ::
{division_by_zero, '/' | '%'}
| {form_out_of_range, Form :: integer(), NPlurals :: pos_integer()}.
-doc """
AST of the plural expression — a literal integer, the variable `n`, a
binop (`{binop, t:op/0, Left, Right}`), the negation unop
(`{unop, '!', _}`) or a ternary (`{ternary, Cond, Then, Else}`).
It is the tree that `compile/1` builds and that `evaluate/2`/`eval_ast/2`
interpret. The depth is bounded by `?PLURAL_EXPR_MAX_DEPTH` and the node
count by `?AST_MAX_NODES`, so no valid instance is arbitrarily large.
""".
-type ast() ::
integer()
%% variable n
| n
| {binop, op(), ast(), ast()}
| {unop, '!', ast()}
| {ternary, ast(), ast(), ast()}.
-doc """
Binary operators accepted in a `t:ast/0`, with C precedence/associativity:
arithmetic (`+ - * / %`), relational (`< > <= >=`), equality (`== !=`) and
short-circuit logical (`&&` `||`). `%` uses `rem` (truncates toward zero,
like C99); `/` and `%` by zero are coerced to 0 on the hot path.
""".
-type op() ::
'+'
| '-'
| '*'
| '/'
| '%'
| '=='
| '!='
| '<'
| '>'
| '<='
| '>='
| '&&'
| '||'.
%% Internal parser state — carries the remaining input and absolute byte
%% offset (for surfacing diagnostic positions in syntax errors).
-record(ps, {
src :: binary(),
pos = 0 :: non_neg_integer()
}).
%% Sanity bound for nplurals. Real-world locales top out at 6 (Arabic).
%% Any header declaring more than a thousand forms is malformed input.
-define(NPLURALS_MAX, 1000).
%% Maximum number of decimal digits accepted for the `nplurals=<digits>`
%% field (finding #8, po-plural-unbounded-binary-to-integer-bignum). The
%% range check is `[1, ?NPLURALS_MAX=1000]`, so 4 digits already covers
%% every legal value; 7 leaves generous headroom for realistic indices
%% while keeping the bignum tiny. Capping by digit COUNT *before*
%% `binary_to_integer` means a thousands-digit adversarial run is
%% rejected in O(1) without ever materialising an O(d^2) bignum or
%% reaching the >=~1.3M-digit `error:system_limit` path.
-define(MAX_INT_DIGITS, 7).
%% Bounds for the `Plural-Forms` expression itself (finding #2,
%% plural-compile-superlinear-unbounded). `?NPLURALS_MAX` bounds the
%% form COUNT, not the expression SIZE, so without these the parser is
%% unbounded in both byte-length and recursion depth on untrusted input.
%%
%% * `?PLURAL_EXPR_MAX_BYTES` — the real-world most-complex rule
%% (Arabic) is ~98 bytes; 2048 is ~20x headroom, so no legitimate
%% catalog is affected, while a multi-KB adversarial expression is
%% rejected before it can be parsed.
%% * `?PLURAL_EXPR_MAX_DEPTH` — Arabic's nesting depth is well under
%% 10; 64 is ~6x headroom and also bounds the recursion depth of the
%% hot-path `eval_ast/2` walker (stack growth per lookup).
-define(PLURAL_EXPR_MAX_BYTES, 2048).
-define(PLURAL_EXPR_MAX_DEPTH, 64).
%% Bound on the number of nodes in the compiled plural AST (finding #9,
%% plural-bignum-cpu-dos-evaluate-hotpath). Complements the byte/depth
%% caps above, which do NOT bound the node count: a wide, flat operator
%% chain (`n*n*...*n`) stays under both — it is left-associative, so it
%% does not nest the parser, and ~1000 factors fit inside 2048 bytes —
%% yet it compiles to ~2000 AST nodes. `evaluate/2` walks that whole tree
%% (and grows an `n^k` bignum) on EVERY ngettext lookup, with no result
%% cache, so the per-lookup cost is super-linear in the chain length and
%% grows with N. Bounding the node count at compile time keeps the
%% installed AST small, so `evaluate/2`'s cost is O(1)-bounded by
%% construction. The real-world most-complex rule (Russian/Arabic) has
%% ~39 nodes; 256 is ~6.5x headroom, so no legitimate catalog is
%% affected, while a pathological chain is rejected before it can poison
%% every later evaluation.
-define(AST_MAX_NODES, 256).
%% Identifier-character predicate, used to reject malformed bare words
%% like `nx`. Macro so the parser inlines the test in a guard.
-define(IS_IDENT(C),
((C >= $a andalso C =< $z) orelse
(C >= $A andalso C =< $Z) orelse
(C >= $0 andalso C =< $9) orelse
C =:= $_)
).
%% =========================
%% Public API
%% =========================
-doc """
Compiles a `.po` `Plural-Forms:` header expression into a
`plural_compiled()` bundle (a `nplurals`/`expr`/`raw` map) reused by each
`evaluate/2`.
`Header` is the header string (`nplurals=N; plural=EXPR;`); the fields are
located in a whitespace-tolerant way. Returns `{ok, Compiled}` or
`{error, compile_error()}`, always fail-closed (never raises), since it
runs over untrusted `.po` inside the gen_server's `handle_call`.
Relevant structural rejections:
- `{expr_too_long, Size, Max}` — expression above `?PLURAL_EXPR_MAX_BYTES`
(2048), refused before parsing;
- `{expr_too_deep, Depth, Pos}` — nesting above `?PLURAL_EXPR_MAX_DEPTH`
(64);
- `{expr_too_complex, Nodes, Max}` — AST with more nodes than
`?AST_MAX_NODES` (256), barring wide flat chains (`n*n*...*n`) that
would grow a bignum per lookup;
- `{unsafe_plural_rule, Reason}` — STATICALLY faulty rule: division/modulo
by a constant divisor of 0, or a constant rule whose form falls outside
`[0, NPlurals)`. Cases that fail only for a specific N are left to the
dynamic clamp of `evaluate/2`;
- `{nplurals_too_many_digits, _, _}`, `{nplurals_out_of_range, _}`,
`{missing_nplurals, _}`, `{missing_plural_expr, _}` and `{syntax_error,
Reason, Pos}` for the remaining header defects.
Edge cases: redundant parentheses and whitespace are absorbed by the
parser; `n` is the ONLY allowed identifier (`nx` or `m` become a
`syntax_error`); degenerate rules `plural=0` (ja/zh/ko/vi/th) compile as
an integer literal (PSD-008). A rule that fails only for a specific N
(e.g. `n/(n-5)`) is NOT rejected here — that is left to the dynamic clamp
of `evaluate/2`.
```erlang
1> erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
{ok,#{raw => <<"nplurals=2; plural=n != 1;">>,expr => {binop,'!=',n,1},nplurals => 2}}
2> erli18n_plural:compile(<<"nplurals=1; plural=0;">>).
{ok,#{raw => <<"nplurals=1; plural=0;">>,expr => 0,nplurals => 1}}
3> erli18n_plural:compile(<<"nplurals=2; plural=n/0;">>).
{error,{unsafe_plural_rule,{division_by_zero,'/'}}}
4> erli18n_plural:compile(<<"nplurals=2; plural=nx;">>).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
5> erli18n_plural:compile(<<"nplurals=2;">>).
{error,{missing_plural_expr,<<"nplurals=2;">>}}
```
See also `evaluate/2` (consume the bundle), `plural_by_po_header/2`
(compile+evaluate at once) and `t:compile_error/0`.
""".
-spec compile(binary()) -> {ok, plural_compiled()} | {error, compile_error()}.
compile(Header) when is_binary(Header) ->
case extract_nplurals(Header) of
{ok, NPlurals} ->
case extract_plural_expr(Header) of
{ok, ExprBin} ->
case parse_expr_bin(ExprBin) of
{ok, Ast} ->
compile_validated(Header, NPlurals, Ast);
{error, _} = Err ->
Err
end;
{error, _} = Err ->
Err
end;
{error, _} = Err ->
Err
end.
%% Apply the two load-time validation barriers to a successfully parsed
%% AST and, on success, materialise the `plural_compiled()` bundle.
%%
%% * Layer 3 (finding #1): reject rules that are STATICALLY guaranteed
%% to fault (literal div/mod by zero, constant out-of-range form)
%% before they can be stored and crash every later lookup.
%% * Node-count cap (finding #9): reject an AST above `?AST_MAX_NODES`
%% so a wide flat chain cannot make `evaluate/2` walk thousands of
%% nodes (and grow a large bignum) on every lookup. Run once here at
%% load time, never on the hot path.
-spec compile_validated(binary(), pos_integer(), ast()) ->
{ok, plural_compiled()} | {error, compile_error()}.
compile_validated(Header, NPlurals, Ast) ->
case validate_safe(Ast, NPlurals) of
ok ->
case check_ast_complexity(Ast) of
ok ->
{ok, #{
nplurals => NPlurals,
expr => Ast,
raw => Header
}};
{error, _} = Err ->
Err
end;
{error, EvalErr} ->
{error, {unsafe_plural_rule, EvalErr}}
end.
%% Evaluate a compiled plural rule for a particular N. Pure function on
%% the hot path — no allocations beyond the return value. Negative N is
%% accepted; the C runtime in libintl applies abs() on the integer, but
%% gettext .po rules are all defined over non-negative N. We pass N
%% through unchanged so the rule's own semantics decide.
%%
%% TOTALITY (finding #1, plural-eval-throws-per-lookup-dos). `.po` input
%% is untrusted (ADR-0003) and this function runs in the CALLER process
%% on every `ngettext`/`npgettext` lookup, so it MUST NOT raise. Two
%% failure modes that a malformed rule could otherwise trigger are
%% neutralised here, matching the GNU libintl runtime:
%%
%% * division / modulo by zero — `eval_div/2` and `eval_rem/2` coerce
%% a zero divisor to a defined value (C undefined behaviour pinned to
%% 0) instead of letting Erlang `div`/`rem` raise `badarith`.
%% * out-of-range form index — clamped to form 0, exactly as
%% `dcigettext.c` (`plural_lookup`) does: `if (index >= nplurals)
%% index = 0;` ("this should never happen" -> clamp, NOT crash).
%%
%% The `-spec` is therefore HONEST: the result is provably
%% `non_neg_integer()` for every N and every AST. Callers that want to
%% OBSERVE the anomaly as data use `evaluate_checked/2` instead.
-doc """
Evaluates a compiled plural rule for a given `N` and returns the plural
form index — the TOTAL hot-path function, used by every
`ngettext`/`npgettext`.
`Compiled` is the bundle from `compile/1`; `N` is the count (an integer,
may be negative — the rule decides the semantics). The return is always a
`non_neg_integer()` in `[0, NPlurals)`: the rule is interpreted and the
result coerced to an integer.
Never raises, even on a malformed rule (parity with GNU libintl):
division/modulo by zero is coerced to 0 (`eval_div/2`/`eval_rem/2` instead
of letting `div`/`rem` raise `badarith`) and a form outside
`[0, NPlurals)` is clamped to 0 (`if index >= nplurals -> index = 0`).
No allocations beyond the return value and no result cache: the cost is
re-paying the AST interpretation on every call — which is why the
`compile/1` caps keep the AST small. A negative `N` is passed through
without `abs()`; the rule decides the semantics (and the clamp protects
the result).
```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate(C, 1).
0
3> erli18n_plural:evaluate(C, 5).
1
%% Divisor DEPENDS on n (passes compile/1's static check),
%% but evaluates to zero at runtime for N=7: clamp to 0, no crash.
4> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
5> erli18n_plural:evaluate(Bad, 7).
0
```
Edge cases: the short-circuit of `&&`/`||` is honoured, so a zero divisor
behind a false branch is never reached. To OBSERVE the anomaly (instead of
silent clamping) use `evaluate_checked/2`. See also `compile/1` and
`plural_by_po_header/2`.
""".
-spec evaluate(plural_compiled(), integer()) -> non_neg_integer().
evaluate(#{nplurals := NPlurals, expr := Ast}, N) when is_integer(N) ->
Form = to_integer(eval_ast(Ast, N)),
clamp_form(Form, NPlurals).
-doc """
Structured sibling of `evaluate/2`: instead of clamping silently, it
reports a malformed rule as data so the consumer can log/alert.
`Compiled` and `N` are as in `evaluate/2`. Returns `{ok, Form}` with the
form in `[0, NPlurals)`, or `{error, plural_eval_error()}`:
`{division_by_zero, '/' | '%'}` when the evaluated divisor is 0, or
`{form_out_of_range, Form, NPlurals}` when the form leaves the range. It
keeps the short-circuit of `&&`/`||` (a zero divisor behind a false branch
is not reported) and, like `evaluate/2`, is total — never raises.
Use this off the hot path, when you want to log/alert the malformed rule;
on the hot path stay with `evaluate/2`, whose clamp is cheaper. Where
`evaluate/2` would return `0` by clamping, this function returns the
corresponding `{error, _}`.
```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate_checked(C, 5).
{ok,1}
%% Same rule as the evaluate/2 example (divisor depends on n).
%% Where evaluate/2 would clamp to 0, here the anomaly comes back as data.
3> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
4> erli18n_plural:evaluate_checked(Bad, 7).
{error,{division_by_zero,'/'}}
5> erli18n_plural:evaluate_checked(Bad, 8).
{ok,1}
```
Edge cases: a form outside `[0, NPlurals)` (where `evaluate/2` would
clamp) becomes `{error, {form_out_of_range, Form, NPlurals}}`. See also
`evaluate/2` (the sibling that clamps) and `t:plural_eval_error/0`.
""".
-spec evaluate_checked(plural_compiled(), integer()) ->
{ok, non_neg_integer()} | {error, plural_eval_error()}.
evaluate_checked(#{nplurals := NPlurals, expr := Ast}, N) when is_integer(N) ->
case eval_ast_checked(Ast, N) of
{error, _} = Err ->
Err;
{ok, Value} ->
Form = to_integer(Value),
case Form >= 0 andalso Form < NPlurals of
true -> {ok, Form};
false -> {error, {form_out_of_range, Form, NPlurals}}
end
end.
%% Clamp a candidate form index into [0, NPlurals) à la libintl. NPlurals
%% is `pos_integer()` (validated at compile), so 0 is always a valid
%% form.
-spec clamp_form(integer(), pos_integer()) -> non_neg_integer().
clamp_form(Form, NPlurals) when Form >= 0, Form < NPlurals ->
Form;
clamp_form(_Form, _NPlurals) ->
0.
-doc """
Convenience that compiles and evaluates in a single step: given the raw
header `Header` and the count `N`, returns `{ok, Form}` or propagates the
`{error, compile_error()}` from `compile/1`.
Recompiles on every call, so it is for one-off use; on the hot path, call
`compile/1` once at load and reuse the bundle with `evaluate/2`.
The internal evaluation uses `evaluate/2` (total), so an `{ok, _}` never
embeds an evaluation anomaly — the only part that can fail is `compile/1`,
whose error is propagated as-is.
```erlang
1> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok,0}
2> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 3).
{ok,1}
3> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=nx;">>, 1).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
```
See also `compile/1` and `evaluate/2`.
""".
-spec plural_by_po_header(binary(), integer()) ->
{ok, non_neg_integer()} | {error, compile_error()}.
plural_by_po_header(Header, N) when is_binary(Header), is_integer(N) ->
case compile(Header) of
{ok, Compiled} -> {ok, evaluate(Compiled, N)};
{error, _} = E -> E
end.
-doc """
Looks up the CLDR canonical plural expression for `Locale` in the embedded
table.
Returns `{ok, Expr}`, where `Expr` is the binary of the C plural
expression equivalent to that locale's CLDR rule, or `undefined` if
neither the locale nor its base language is in the table. The match is
case-sensitive; region tags fall back to the base language when the region
itself is not listed (e.g. `fr_BE` -> `fr`, since `fr_BE` has no row of
its own in the table).
A lookup/observability function — NOT on the hot path (PSD-004: the `.po`
header is the runtime source-of-truth). The embedded table (`cldr_data/0`)
carries one row per locale the GNU gettext / CLDR seed defines. Both `_` and
`-` separators are accepted in the fallback to the base language.
```erlang
%% Direct hit: the entry exists in the table.
1> erli18n_plural:cldr_rule(<<"ru">>).
{ok,<<"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>}
%% Direct hit: `fr_CA` HAS its own row, so it resolves without falling back.
2> erli18n_plural:cldr_rule(<<"fr_CA">>).
{ok,<<"n > 1">>}
%% Fallback to the base language: `fr_BE` is not in the table, falls back to `fr`.
3> erli18n_plural:cldr_rule(<<"fr_BE">>).
{ok,<<"n > 1">>}
%% Neither the locale nor the base exists.
4> erli18n_plural:cldr_rule(<<"xx">>).
undefined
```
Edge cases: `pt_PT` (`n != 1`) diverges from the base `pt` (`n > 1`), so
the region entry exists separately. See also `validate_against_cldr/2`
(compare a header against the CLDR rule) and `fallback_rule/0`.
""".
-spec cldr_rule(binary()) -> {ok, binary()} | undefined.
cldr_rule(Locale) when is_binary(Locale) ->
case lookup_locale(Locale) of
{ok, _N, Expr} ->
{ok, Expr};
undefined ->
case base_locale(Locale) of
Locale ->
undefined;
Base ->
case lookup_locale(Base) of
{ok, _N2, Expr2} -> {ok, Expr2};
undefined -> undefined
end
end
end.
%% Compare a `.po` header expression against the CLDR canonical rule for
%% the given locale. Returns `ok` if the parsed ASTs are structurally
%% identical (whitespace-insensitive) or `{warning, _}` if they diverge
%% in a way that would affect runtime form selection. Per PSD-004 the
%% header always wins at runtime — this only produces observability.
%%
-doc """
Compares the plural expression of header `HeaderRule` (raw form) against
the CLDR canonical rule of `Locale`, producing only observability — at
runtime the header always wins (PSD-004).
Compiles `HeaderRule` ONCE and delegates to `validate_against_cldr_ast/2`.
Returns `ok` when the `(nplurals, expr)` ASTs are structurally equal
(whitespace/paren-insensitive) or when the locale has no CLDR entry;
returns `{warning, {plural_divergence, Locale, HeaderRule, CldrRaw}}` when
they diverge — including when the header is invalid but the locale is
listed in CLDR.
A convenience entry point for callers that only have the raw header. The
catalog loader, which already keeps the compiled bundle, should use
`validate_against_cldr_ast/2` to avoid recompiling the header at load.
The comparison is STRUCTURAL over the `(nplurals, expr-AST)` pair, so it
is insensitive to whitespace and redundant parentheses: `(n != 1)` matches
`n != 1`. Nothing changes at runtime — the warning exists only for
telemetry.
```erlang
%% Header agrees with fr's CLDR (n > 1): no warning.
1> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=(n > 1);">>).
ok
%% Header diverges from fr's CLDR: warning (but the header would win at runtime).
2> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=n != 1;">>).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
%% Locale with no CLDR entry: nothing to validate.
3> erli18n_plural:validate_against_cldr(<<"xx">>, <<"nplurals=2; plural=n != 1;">>).
ok
```
Edge cases: an INVALID header against a locale that IS listed in CLDR
still produces `{warning, _}` (it cannot match the canonical rule);
against a locale with no CLDR entry it becomes `ok`. See also
`validate_against_cldr_ast/2` (variant without recompiling) and
`cldr_rule/1`.
""".
-spec validate_against_cldr(binary(), binary()) ->
ok
| {warning, {plural_divergence, binary(), binary(), binary()}}.
validate_against_cldr(Locale, HeaderRule) when
is_binary(Locale), is_binary(HeaderRule)
->
case compile(HeaderRule) of
{ok, Compiled} ->
validate_against_cldr_ast(Locale, Compiled);
{error, _} ->
%% An unparseable header has no AST to compare. Before
%% finding #17 this still produced a `{warning, _}` for a
%% CLDR-listed locale (the header could not match the
%% canonical rule), so preserve that observable behaviour.
case cldr_compiled(Locale) of
undefined -> ok;
#{raw := CldrRaw} -> {warning, {plural_divergence, Locale, HeaderRule, CldrRaw}}
end
end.
%% AST-based sibling of `validate_against_cldr/2`. Takes the ALREADY
%% compiled header bundle (`plural_compiled()`) and compares it against
%% the CLDR canonical rule for the locale without recompiling anything:
%%
%% * the header AST is reused as-is (the loader compiled it once via
%% `compile/1` and keeps it in the catalog map);
%% * the CLDR rule is taken from a one-time, memoised table of compiled
%% ASTs (`cldr_compiled/1`), so no CLDR rule is parsed/synthesised on
%% the load path either.
%%
%% Equivalence is structural on `(nplurals, expr-AST)` — exactly what the
%% old `ast_equivalent/2` computed, but with both sides already parsed.
%% The warning payload keeps the raw header string (the bundle's `raw`
%% field) and the raw CLDR expression, matching `validate_against_cldr/2`.
-doc """
AST-based variant of `validate_against_cldr/2`: takes the ALREADY compiled
bundle (`plural_compiled()`) and compares it against the CLDR rule of
`Locale` without recompiling anything (finding #17).
Reuses the header AST as-is and takes the CLDR side from a memoised table
of compiled bundles, so no rule is re-parsed at load. Returns `ok` if the
`(nplurals, expr)` pairs match or if the locale has no CLDR entry;
otherwise `{warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}}`,
with the raw header (the bundle's `raw` field) and the raw CLDR
expression.
This is the PREFERRED form in the loader (finding #17): since the bundle
was already compiled by `compile/1` at load, it avoids the second
`compile/1` that `validate_against_cldr/2` would do, and the CLDR side
comes from the `persistent_term` cache (`cldr_compiled_table/0`), not
re-synthesised per load.
```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:validate_against_cldr_ast(<<"fr">>, C).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
3> {ok, Cde} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
4> erli18n_plural:validate_against_cldr_ast(<<"de">>, Cde).
ok
```
Edge cases: a locale with no CLDR entry becomes `ok` (nothing to log). See
also `validate_against_cldr/2` (from the raw header) and `cldr_compiled/1`
(the memoisation of the CLDR side).
""".
-spec validate_against_cldr_ast(binary(), plural_compiled()) ->
ok
| {warning, {plural_divergence, binary(), binary(), binary()}}.
validate_against_cldr_ast(Locale, #{nplurals := NH, expr := EH, raw := HeaderRaw}) when
is_binary(Locale)
->
case cldr_compiled(Locale) of
undefined ->
%% Locale has no CLDR entry; we cannot validate. Treat as ok
%% — the loader has nothing meaningful to log.
ok;
#{nplurals := NC, expr := EC, raw := CldrRaw} ->
case NH =:= NC andalso EH =:= EC of
true -> ok;
false -> {warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}}
end
end.
-doc """
Fallback plural rule used when a `.po` catalog ships no `Plural-Forms:`
header at all (a degenerate but tolerated input).
Returns `<<"nplurals=2; plural=n != 1;">>` — the Germanic C/English
default cited by the GNU gettext manual (§"Plural forms").
A pure constant, no side effects. The result is a raw header ready for
`compile/1`, so the loader's fallback path reuses exactly the same
pipeline as a legitimate header.
```erlang
1> erli18n_plural:fallback_rule().
<<"nplurals=2; plural=n != 1;">>
2> {ok, C} = erli18n_plural:compile(erli18n_plural:fallback_rule()),
2> erli18n_plural:evaluate(C, 1).
0
```
See also `compile/1` and `cldr_rule/1`.
""".
-spec fallback_rule() -> binary().
fallback_rule() ->
~"nplurals=2; plural=n != 1;".
%% =========================
%% Header tokenization
%% =========================
%% Extract the integer following `nplurals=`. Tolerant of surrounding
%% whitespace and the trailing semicolon. Returns
%% `{error, {missing_nplurals, _}}` when the field is not present, and
%% `{error, {nplurals_out_of_range, N}}` when N is outside the sanity
%% range [1, ?NPLURALS_MAX].
-spec extract_nplurals(binary()) ->
{ok, pos_integer()} | {error, compile_error()}.
extract_nplurals(Header) ->
case locate_field(Header, ~"nplurals") of
{ok, Tail} ->
{Digits, _} = consume_integer(skip_ws(Tail)),
case byte_size(Digits) of
0 ->
{error, {missing_nplurals, Header}};
D when D > ?MAX_INT_DIGITS ->
%% Finding #8: cap by DIGIT COUNT before
%% `binary_to_integer` materialises the bignum, and
%% keep the rejected value OUT of the payload (only
%% the digit count + cap are reported) to avoid
%% memory/log amplification and the system_limit path.
{error, {nplurals_too_many_digits, D, ?MAX_INT_DIGITS}};
_ ->
N = binary_to_integer(Digits),
case N >= 1 andalso N =< ?NPLURALS_MAX of
true -> {ok, N};
false -> {error, {nplurals_out_of_range, N}}
end
end;
not_found ->
{error, {missing_nplurals, Header}}
end.
%% Extract the raw expression following `plural=`, stripping the trailing
%% semicolon and surrounding whitespace. The returned binary still needs
%% to be fed to the recursive-descent parser.
-spec extract_plural_expr(binary()) ->
{ok, binary()} | {error, compile_error()}.
extract_plural_expr(Header) ->
case locate_field(Header, ~"plural") of
{ok, Tail} ->
ExprRaw = take_until_semicolon_or_end(Tail),
case trim(ExprRaw) of
<<>> -> {error, {missing_plural_expr, Header}};
Trimmed -> {ok, Trimmed}
end;
not_found ->
{error, {missing_plural_expr, Header}}
end.
%% Locate `Field=` (case-sensitive — GNU gettext spec keeps these names
%% lower-case) in Header and return the bytes immediately after the `=`.
locate_field(Header, Field) ->
%% Walk the header looking for `Field` followed (after optional
%% whitespace) by `=`. We require either start-of-string or a
%% delimiter (whitespace or `;`) before `Field` so that we do not
%% match `nplurals` inside `intplurals` or similar.
locate_field(Header, Field, 0).
locate_field(Bin, Field, Offset) ->
case binary:match(Bin, Field, [{scope, {Offset, byte_size(Bin) - Offset}}]) of
nomatch ->
not_found;
{Start, Len} ->
case is_field_boundary_left(Bin, Start) of
true ->
Tail0 = binary:part(
Bin,
Start + Len,
byte_size(Bin) - (Start + Len)
),
case skip_to_equals(Tail0) of
{ok, Tail1} -> {ok, Tail1};
not_found -> locate_field(Bin, Field, Start + Len)
end;
false ->
locate_field(Bin, Field, Start + Len)
end
end.
is_field_boundary_left(_Bin, 0) ->
true;
is_field_boundary_left(Bin, Start) ->
Prev = binary:at(Bin, Start - 1),
is_header_delim(Prev).
is_header_delim($\s) -> true;
is_header_delim($\t) -> true;
is_header_delim($\n) -> true;
is_header_delim($\r) -> true;
is_header_delim($;) -> true;
is_header_delim(_) -> false.
skip_to_equals(<<>>) ->
not_found;
skip_to_equals(<<$=, Rest/binary>>) ->
{ok, Rest};
skip_to_equals(<<C, Rest/binary>>) when C =:= $\s; C =:= $\t ->
skip_to_equals(Rest);
skip_to_equals(_) ->
not_found.
take_until_semicolon_or_end(Bin) ->
take_until_semicolon_or_end(Bin, 0).
take_until_semicolon_or_end(Bin, N) when N >= byte_size(Bin) ->
Bin;
take_until_semicolon_or_end(Bin, N) ->
case binary:at(Bin, N) of
$; -> binary:part(Bin, 0, N);
%% header line terminator
$\n -> binary:part(Bin, 0, N);
_ -> take_until_semicolon_or_end(Bin, N + 1)
end.
%% =========================
%% Recursive-descent parser
%% =========================
%%
%% Grammar (precedence low -> high), per GNU manual §"Plural forms":
%%
%% expr := ternary
%% ternary := lor ('?' expr ':' expr)? (right-assoc)
%% lor := land ('||' land)* (left-assoc)
%% land := equality ('&&' equality)* (left-assoc)
%% equality := relational (('==' | '!=') relational)*
%% relational := additive (('<' | '>' | '<=' | '>=') additive)*
%% additive := multiplicative (('+' | '-') multiplicative)*
%% multiplicative := unary (('*' | '/' | '%') unary)*
%% unary := '!' unary | primary
%% primary := INTEGER | 'n' | '(' expr ')'
-spec parse_expr_bin(binary()) ->
{ok, ast()} | {error, compile_error()}.
%% Finding #2: reject an over-long expression BEFORE parsing it, so a
%% multi-KB adversarial rule never reaches the recursive descent.
parse_expr_bin(ExprBin) when byte_size(ExprBin) > ?PLURAL_EXPR_MAX_BYTES ->
{error, {expr_too_long, byte_size(ExprBin), ?PLURAL_EXPR_MAX_BYTES}};
parse_expr_bin(ExprBin) ->
try
{Ast, St} = parse_expr(#ps{src = ExprBin}, 0),
case skip_ws_st(St) of
#ps{src = <<>>} ->
{ok, Ast};
St2 ->
%% Trailing garbage (e.g. unbalanced `)` or stray token).
{error, {syntax_error, {trailing_input, St2#ps.src}, St2#ps.pos}}
end
catch
throw:{syntax_error, Reason, Pos} ->
{error, {syntax_error, Reason, Pos}};
%% Finding #2: recursion-depth guard tripped — fail closed with a
%% structured error rather than parsing an unbounded-depth tree.
throw:{expr_too_deep, Depth, Pos} ->
{error, {expr_too_deep, Depth, Pos}}
end.
%% `Depth` is propagated through every recursive-descent clause and
%% checked at each new nesting level (finding #2). It bounds both the
%% parser's stack and — because the AST it builds is no deeper than the
%% recursion — the hot-path `eval_ast/2` walker's stack per lookup.
parse_expr(St, Depth) when Depth > ?PLURAL_EXPR_MAX_DEPTH ->
throw({expr_too_deep, Depth, St#ps.pos});
parse_expr(St, Depth) ->
parse_ternary(St, Depth).
parse_ternary(St0, Depth) ->
{Cond, St1} = parse_lor(St0, Depth),
St2 = skip_ws_st(St1),
case peek_byte(St2) of
{ok, $?} ->
St3 = advance(St2, 1),
{Then, St4} = parse_expr(St3, Depth + 1),
St5 = skip_ws_st(St4),
case peek_byte(St5) of
{ok, $:} ->
St6 = advance(St5, 1),
{Else, St7} = parse_expr(St6, Depth + 1),
{{ternary, Cond, Then, Else}, St7};
_ ->
throw({syntax_error, {expected, $:, peek_byte(St5)}, St5#ps.pos})
end;
_ ->
{Cond, St1}
end.
parse_lor(St0, Depth) ->
{Left, St1} = parse_land(St0, Depth),
parse_lor_tail(Left, St1, Depth).
parse_lor_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek2(St1) of
{ok, $|, $|} ->
St2 = advance(St1, 2),
{Right, St3} = parse_land(St2, Depth + 1),
parse_lor_tail({binop, '||', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_land(St0, Depth) ->
{Left, St1} = parse_equality(St0, Depth),
parse_land_tail(Left, St1, Depth).
parse_land_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek2(St1) of
{ok, $&, $&} ->
St2 = advance(St1, 2),
{Right, St3} = parse_equality(St2, Depth + 1),
parse_land_tail({binop, '&&', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_equality(St0, Depth) ->
{Left, St1} = parse_relational(St0, Depth),
parse_equality_tail(Left, St1, Depth).
parse_equality_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek2(St1) of
{ok, $=, $=} ->
St2 = advance(St1, 2),
{Right, St3} = parse_relational(St2, Depth + 1),
parse_equality_tail({binop, '==', Left, Right}, St3, Depth);
{ok, $!, $=} ->
St2 = advance(St1, 2),
{Right, St3} = parse_relational(St2, Depth + 1),
parse_equality_tail({binop, '!=', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_relational(St0, Depth) ->
{Left, St1} = parse_additive(St0, Depth),
parse_relational_tail(Left, St1, Depth).
parse_relational_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek2(St1) of
{ok, $<, $=} ->
St2 = advance(St1, 2),
{Right, St3} = parse_additive(St2, Depth + 1),
parse_relational_tail({binop, '<=', Left, Right}, St3, Depth);
{ok, $>, $=} ->
St2 = advance(St1, 2),
{Right, St3} = parse_additive(St2, Depth + 1),
parse_relational_tail({binop, '>=', Left, Right}, St3, Depth);
{ok, $<, _} ->
St2 = advance(St1, 1),
{Right, St3} = parse_additive(St2, Depth + 1),
parse_relational_tail({binop, '<', Left, Right}, St3, Depth);
{ok, $>, _} ->
St2 = advance(St1, 1),
{Right, St3} = parse_additive(St2, Depth + 1),
parse_relational_tail({binop, '>', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_additive(St0, Depth) ->
{Left, St1} = parse_multiplicative(St0, Depth),
parse_additive_tail(Left, St1, Depth).
parse_additive_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek_byte(St1) of
{ok, $+} ->
St2 = advance(St1, 1),
{Right, St3} = parse_multiplicative(St2, Depth + 1),
parse_additive_tail({binop, '+', Left, Right}, St3, Depth);
{ok, $-} ->
St2 = advance(St1, 1),
{Right, St3} = parse_multiplicative(St2, Depth + 1),
parse_additive_tail({binop, '-', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_multiplicative(St0, Depth) ->
{Left, St1} = parse_unary(St0, Depth),
parse_multiplicative_tail(Left, St1, Depth).
parse_multiplicative_tail(Left, St0, Depth) ->
St1 = skip_ws_st(St0),
case peek_byte(St1) of
{ok, $*} ->
St2 = advance(St1, 1),
{Right, St3} = parse_unary(St2, Depth + 1),
parse_multiplicative_tail({binop, '*', Left, Right}, St3, Depth);
{ok, $/} ->
St2 = advance(St1, 1),
{Right, St3} = parse_unary(St2, Depth + 1),
parse_multiplicative_tail({binop, '/', Left, Right}, St3, Depth);
{ok, $%} ->
St2 = advance(St1, 1),
{Right, St3} = parse_unary(St2, Depth + 1),
parse_multiplicative_tail({binop, '%', Left, Right}, St3, Depth);
_ ->
{Left, St0}
end.
parse_unary(St0, Depth) when Depth > ?PLURAL_EXPR_MAX_DEPTH ->
throw({expr_too_deep, Depth, St0#ps.pos});
parse_unary(St0, Depth) ->
St1 = skip_ws_st(St0),
case peek_byte(St1) of
{ok, $!} ->
%% Disambiguate against `!=` (handled in parse_equality).
case peek2(St1) of
{ok, $!, $=} ->
parse_primary(St1, Depth);
_ ->
St2 = advance(St1, 1),
{Inner, St3} = parse_unary(St2, Depth + 1),
{{unop, '!', Inner}, St3}
end;
_ ->
parse_primary(St1, Depth)
end.
parse_primary(St0, Depth) ->
St1 = skip_ws_st(St0),
case peek_byte(St1) of
{ok, $(} ->
St2 = advance(St1, 1),
{Inner, St3} = parse_expr(St2, Depth + 1),
St4 = skip_ws_st(St3),
case peek_byte(St4) of
{ok, $)} ->
{Inner, advance(St4, 1)};
_ ->
throw({syntax_error, {unclosed_paren, peek_byte(St4)}, St4#ps.pos})
end;
{ok, $n} ->
%% `n` is a single-character identifier; the GNU grammar
%% does not permit multi-character identifiers in plural
%% expressions.
St2 = advance(St1, 1),
case peek_byte(St2) of
{ok, C} when ?IS_IDENT(C) ->
throw({syntax_error, {unknown_identifier_after_n, C}, St2#ps.pos});
_ ->
{n, St2}
end;
{ok, D} when D >= $0, D =< $9 ->
{Digits, St2} = consume_integer_st(St1),
{binary_to_integer(Digits), St2};
{ok, C} ->
throw({syntax_error, {unexpected_char, C}, St1#ps.pos});
eof ->
throw({syntax_error, unexpected_eof, St1#ps.pos})
end.
%% =========================
%% Parser state helpers
%% =========================
%% Finding #2 (plural-compile-superlinear-unbounded): dispatch on a
%% `byte_size/1` comparison, NOT a full-binary `=:=` match. `skip_ws/1`
%% only ever strips a leading prefix, so when nothing is consumed the
%% result is byte-for-byte equal and `byte_size(Rest) =:= byte_size(Src)`
%% is exact. Matching `case skip_ws(Src) of Src -> ...` instead forced a
%% byte-by-byte comparison of the whole remaining input on every token
%% (the binaries are structurally equal but not identical), which made
%% the parser O(n^2). `byte_size/1` is O(1), collapsing it to O(n).
skip_ws_st(#ps{src = Src, pos = Pos} = St) ->
Rest = skip_ws(Src),
case byte_size(Rest) =:= byte_size(Src) of
true ->
St;
false ->
Consumed = byte_size(Src) - byte_size(Rest),
St#ps{src = Rest, pos = Pos + Consumed}
end.
skip_ws(<<C, Rest/binary>>) when
C =:= $\s;
C =:= $\t;
C =:= $\n;
C =:= $\r
->
skip_ws(Rest);
skip_ws(Bin) ->
Bin.
peek_byte(#ps{src = <<>>}) -> eof;
peek_byte(#ps{src = <<B, _/binary>>}) -> {ok, B}.
peek2(#ps{src = <<>>}) -> eof;
peek2(#ps{src = <<_>>}) -> eof;
peek2(#ps{src = <<A, B, _/binary>>}) -> {ok, A, B}.
advance(#ps{src = Src, pos = Pos} = St, N) ->
St#ps{
src = binary:part(Src, N, byte_size(Src) - N),
pos = Pos + N
}.
consume_integer_st(#ps{src = Src, pos = Pos}) ->
{Digits, Rest} = consume_integer(Src),
Len = byte_size(Digits),
{Digits, #ps{src = Rest, pos = Pos + Len}}.
consume_integer(Bin) -> consume_integer(Bin, 0).
consume_integer(Bin, N) when N >= byte_size(Bin) ->
{Bin, <<>>};
consume_integer(Bin, N) ->
case binary:at(Bin, N) of
D when D >= $0, D =< $9 -> consume_integer(Bin, N + 1);
_ -> {binary:part(Bin, 0, N), binary:part(Bin, N, byte_size(Bin) - N)}
end.
trim(Bin) -> trim_trailing(trim_leading(Bin)).
trim_leading(<<C, Rest/binary>>) when
C =:= $\s;
C =:= $\t;
C =:= $\n;
C =:= $\r
->
trim_leading(Rest);
trim_leading(Bin) ->
Bin.
trim_trailing(Bin) ->
Size = byte_size(Bin),
trim_trailing(Bin, Size).
trim_trailing(_Bin, 0) ->
<<>>;
trim_trailing(Bin, N) ->
case binary:at(Bin, N - 1) of
C when C =:= $\s; C =:= $\t; C =:= $\n; C =:= $\r ->
trim_trailing(Bin, N - 1);
_ ->
binary:part(Bin, 0, N)
end.
%% =========================
%% Interpreter (hot path)
%% =========================
%% C-truthy coercion (from legacy gettexter_plural to_boolean/1).
%% `0` is false, any other integer is true. Boolean inputs are passed
%% through to keep short-circuit interop in `eval_ast/2` clean.
-spec to_boolean(integer() | boolean()) -> boolean().
to_boolean(true) -> true;
to_boolean(false) -> false;
to_boolean(0) -> false;
to_boolean(N) when is_integer(N) -> true.
%% Reverse coercion: booleans returned by `&&`/`||`/`!`/comparison ops
%% must materialize as 0 or 1 on the way out (since plural form indices
%% are integers).
-spec to_integer(integer() | boolean()) -> integer().
to_integer(true) -> 1;
to_integer(false) -> 0;
to_integer(N) when is_integer(N) -> N.
-doc """
Hot-path interpreter: walks the `t:ast/0` for a given `N` and returns an
integer (arithmetic result) OR a boolean (comparison/logical result) — the
caller (`evaluate/2`) coerces via `to_integer/1`.
Invariants for the maintainer: `&&`/`||` short-circuit (mirroring C), so
the right branch is only evaluated when needed — this is what keeps
`evaluate/2` total for `n != 0 && 1/n`. Division/modulo by zero goes
through `eval_div/2`/`eval_rem/2` (coerced to 0), never raw `div`/`rem`.
The recursion depth is bounded by the `?PLURAL_EXPR_MAX_DEPTH` enforced in
`compile/1`, so the stack per lookup is bounded by construction. The
sibling `eval_ast_checked/2` has the SAME shape, but reports the anomaly
as data.
""".
%% Walker. Returns either an integer (arithmetic result) or a boolean
%% (comparison / logical result) — the caller coerces as needed.
-spec eval_ast(ast(), integer()) -> integer() | boolean().
eval_ast(N, _N) when is_integer(N) ->
N;
eval_ast(n, N) ->
N;
eval_ast({unop, '!', E}, N) ->
not to_boolean(eval_ast(E, N));
eval_ast({binop, '&&', L, R}, N) ->
%% Short-circuit per C semantics: if L is false, do not evaluate R.
case to_boolean(eval_ast(L, N)) of
false -> false;
true -> to_boolean(eval_ast(R, N))
end;
eval_ast({binop, '||', L, R}, N) ->
case to_boolean(eval_ast(L, N)) of
true -> true;
false -> to_boolean(eval_ast(R, N))
end;
eval_ast({binop, Op, L, R}, N) ->
LV = to_integer(eval_ast(L, N)),
RV = to_integer(eval_ast(R, N)),
apply_binop(Op, LV, RV);
eval_ast({ternary, C, T, E}, N) ->
case to_boolean(eval_ast(C, N)) of
true -> eval_ast(T, N);
false -> eval_ast(E, N)
end.
-spec apply_binop(op(), integer(), integer()) -> integer() | boolean().
apply_binop('+', L, R) -> L + R;
apply_binop('-', L, R) -> L - R;
apply_binop('*', L, R) -> L * R;
apply_binop('/', L, R) -> eval_div(L, R);
apply_binop('%', L, R) -> eval_rem(L, R);
apply_binop('==', L, R) -> L =:= R;
apply_binop('!=', L, R) -> L =/= R;
apply_binop('<', L, R) -> L < R;
apply_binop('>', L, R) -> L > R;
apply_binop('<=', L, R) -> L =< R;
apply_binop('>=', L, R) -> L >= R.
%% Total division / modulo (finding #1, Layer 1). A zero divisor is C
%% undefined behaviour; rather than let Erlang `div`/`rem` raise
%% `badarith` in the caller process, we pin the result to 0 — the
%% expression result is still clamped into range afterwards. Real `.po`
%% rules never divide by a value that reaches zero (they guard with
%% short-circuit `&&`/`||`), so this only ever fires on malformed input.
-spec eval_div(integer(), integer()) -> integer().
eval_div(_L, 0) -> 0;
eval_div(L, R) -> L div R.
-spec eval_rem(integer(), integer()) -> integer().
eval_rem(_L, 0) -> 0;
eval_rem(L, R) -> L rem R.
%% =========================
%% Checked interpreter (finding #1, Layer 2)
%% =========================
%%
%% Mirror of `eval_ast/2` that surfaces the two unsafe conditions as
%% structured data instead of clamping: division/modulo by zero and
%% (handled by the caller `evaluate_checked/2`) an out-of-range form.
%% Short-circuit semantics for `&&`/`||` are preserved, so a zero
%% divisor guarded behind a false branch is never reported — matching
%% the dynamic evaluator. Total: never raises.
-spec eval_ast_checked(ast(), integer()) ->
{ok, integer() | boolean()} | {error, plural_eval_error()}.
eval_ast_checked(N, _N) when is_integer(N) ->
{ok, N};
eval_ast_checked(n, N) ->
{ok, N};
eval_ast_checked({unop, '!', E}, N) ->
case eval_ast_checked(E, N) of
{ok, V} -> {ok, not to_boolean(V)};
{error, _} = Err -> Err
end;
eval_ast_checked({binop, '&&', L, R}, N) ->
case eval_ast_checked(L, N) of
{error, _} = Err ->
Err;
{ok, LV} ->
case to_boolean(LV) of
false -> {ok, false};
true -> to_boolean_checked(eval_ast_checked(R, N))
end
end;
eval_ast_checked({binop, '||', L, R}, N) ->
case eval_ast_checked(L, N) of
{error, _} = Err ->
Err;
{ok, LV} ->
case to_boolean(LV) of
true -> {ok, true};
false -> to_boolean_checked(eval_ast_checked(R, N))
end
end;
eval_ast_checked({binop, Op, L, R}, N) ->
case eval_ast_checked(L, N) of
{error, _} = ErrL ->
ErrL;
{ok, LV0} ->
case eval_ast_checked(R, N) of
{error, _} = ErrR ->
ErrR;
{ok, RV0} ->
apply_binop_checked(Op, to_integer(LV0), to_integer(RV0))
end
end;
eval_ast_checked({ternary, C, T, E}, N) ->
case eval_ast_checked(C, N) of
{error, _} = Err ->
Err;
{ok, CV} ->
case to_boolean(CV) of
true -> eval_ast_checked(T, N);
false -> eval_ast_checked(E, N)
end
end.
%% Coerce the right operand of a short-circuit op to a boolean result,
%% propagating any error from the underlying evaluation.
-spec to_boolean_checked({ok, integer() | boolean()} | {error, plural_eval_error()}) ->
{ok, boolean()} | {error, plural_eval_error()}.
to_boolean_checked({error, _} = Err) -> Err;
to_boolean_checked({ok, V}) -> {ok, to_boolean(V)}.
-spec apply_binop_checked(op(), integer(), integer()) ->
{ok, integer() | boolean()} | {error, plural_eval_error()}.
apply_binop_checked('/', _L, 0) -> {error, {division_by_zero, '/'}};
apply_binop_checked('%', _L, 0) -> {error, {division_by_zero, '%'}};
apply_binop_checked(Op, L, R) -> {ok, apply_binop(Op, L, R)}.
%% =========================
%% Static safety validation (finding #1, Layer 3)
%% =========================
%%
%% Reject — at compile/load time — rules that are STATICALLY guaranteed
%% to fault for every N, so the poisoned catalog is refused by
%% `ensure_loaded` instead of loading as `{ok, _}` and crashing each
%% later lookup. We only reject what is *provably* faulty regardless of
%% input; conditions that fault only for a specific N (e.g. `n / (n-5)`)
%% are left to the dynamic clamp in Layer 1.
%%
%% Two static faults are detected:
%% * a `/` or `%` whose divisor is a constant subexpression (no `n`)
%% evaluating to 0 — fails for all N;
%% * a fully-constant rule (no `n` anywhere) whose value lands outside
%% [0, NPlurals) — selects a non-existent form for all N.
-doc """
Static safety barrier of `compile/1` (Layer 3, finding #1): rejects at
LOAD only what is provably faulty for EVERY N, so the poisoned catalog is
refused instead of loading as `{ok, _}` and crashing every lookup.
Detects two defects: (1) a `/` or `%` whose divisor is constant (no `n`)
and evaluates to 0 — via `static_div_zero/1`; (2) a FULLY constant rule
(no `n` anywhere) whose value falls outside `[0, NPlurals)` — probed with
`N=0` by `static_form_in_range/2`. Faults that depend on a specific N
(e.g. `n/(n-5)`) are NOT rejected here — they are left to the dynamic
clamp of `evaluate/2` (Layer 1). Total: never raises; the error becomes
the payload of `{unsafe_plural_rule, _}` in `compile/1`.
""".
-spec validate_safe(ast(), pos_integer()) -> ok | {error, plural_eval_error()}.
validate_safe(Ast, NPlurals) ->
case static_div_zero(Ast) of
{error, _} = Err ->
Err;
ok ->
case is_constant(Ast) of
false ->
%% Depends on N — Layer 1 covers any dynamic fault.
ok;
true ->
%% N is irrelevant for a constant rule; 0 is a fine
%% probe. A constant out-of-range result is a static
%% fault. We evaluate the AST directly (not via
%% evaluate_checked/2) to avoid materialising a partial
%% plural_compiled() map just for the probe.
static_form_in_range(Ast, NPlurals)
end
end.
%% Probe a constant rule's form index and reject it if it lands outside
%% [0, NPlurals). `static_div_zero/1` has already cleared the AST of
%% division-by-zero, so the only error reachable here is an out-of-range
%% form; the {error, _} arm just propagates any future eval anomaly.
-spec static_form_in_range(ast(), pos_integer()) -> ok | {error, plural_eval_error()}.
static_form_in_range(Ast, NPlurals) ->
%% `Ast` is constant and already cleared of static div-by-zero upstream, so
%% `eval_ast_checked/2` returns `{ok, _}` here; an `{error, _}` would be a
%% contract violation and crashes explicitly (`case_clause`).
case eval_ast_checked(Ast, 0) of
{ok, Value} ->
Form = to_integer(Value),
case Form >= 0 andalso Form < NPlurals of
true -> ok;
false -> {error, {form_out_of_range, Form, NPlurals}}
end
end.
%% Walk the AST for a `/` or `%` whose divisor is statically zero. The
%% divisor counts as statically zero only when it is constant (contains
%% no `n`) and evaluates to 0 — a divisor that depends on N is not a
%% static fault.
-spec static_div_zero(ast()) -> ok | {error, plural_eval_error()}.
static_div_zero(N) when is_integer(N) ->
ok;
static_div_zero(n) ->
ok;
static_div_zero({unop, '!', E}) ->
static_div_zero(E);
static_div_zero({binop, Op, L, R}) when Op =:= '/'; Op =:= '%' ->
case static_div_zero(L) of
{error, _} = Err ->
Err;
ok ->
case static_div_zero(R) of
{error, _} = Err -> Err;
ok -> check_static_divisor(Op, R)
end
end;
static_div_zero({binop, _Op, L, R}) ->
case static_div_zero(L) of
{error, _} = Err -> Err;
ok -> static_div_zero(R)
end;
static_div_zero({ternary, C, T, E}) ->
case static_div_zero(C) of
{error, _} = Err ->
Err;
ok ->
case static_div_zero(T) of
{error, _} = Err -> Err;
ok -> static_div_zero(E)
end
end.
-spec check_static_divisor('/' | '%', ast()) -> ok | {error, plural_eval_error()}.
check_static_divisor(Op, Divisor) ->
case is_constant(Divisor) of
false ->
ok;
true ->
%% A nested static div-by-zero inside the divisor is already
%% reported by the recursive walk, so `eval_ast_checked/2` returns
%% `{ok, _}` here (an `{error, _}` would crash explicitly).
case eval_ast_checked(Divisor, 0) of
{ok, V} ->
case to_integer(V) of
0 -> {error, {division_by_zero, Op}};
_ -> ok
end
end
end.
%% True when the AST contains no reference to the variable `n`, so its
%% value is independent of the lookup count.
-spec is_constant(ast()) -> boolean().
is_constant(N) when is_integer(N) -> true;
is_constant(n) -> false;
is_constant({unop, '!', E}) -> is_constant(E);
is_constant({binop, _Op, L, R}) -> is_constant(L) andalso is_constant(R);
is_constant({ternary, C, T, E}) -> is_constant(C) andalso is_constant(T) andalso is_constant(E).
%% =========================
%% AST node-count cap (finding #9, plural-bignum-cpu-dos-evaluate-hotpath)
%% =========================
%%
%% Reject — at compile/load time — an AST whose node count exceeds
%% `?AST_MAX_NODES`. The byte and depth caps from finding #2 do not bound
%% the node count, so a wide flat operator chain (`n*n*...*n`, ~2000 nodes
%% inside the 2048-byte cap, at a single recursion level) would otherwise
%% be installed and walked by `evaluate/2` — growing an `n^k` bignum — on
%% every ngettext lookup. Capping the node count keeps the installed AST
%% small so `evaluate/2`'s cost is bounded by construction (largest
%% intermediate bignum is `n^?AST_MAX_NODES`). Runs once on the load path,
%% never on the hot path.
%% Short-circuiting node-count guard: stops descending as soon as the
%% budget is blown, so its cost is O(min(nodes, ?AST_MAX_NODES)) — never
%% proportional to a pathologically large AST.
-spec check_ast_complexity(ast()) ->
ok | {error, {expr_too_complex, pos_integer(), pos_integer()}}.
check_ast_complexity(Ast) ->
case count_nodes_bounded(Ast, 0) of
{ok, _Total} ->
ok;
over_limit ->
%% Only the (rare, off-hot-path) error branch pays for the
%% exact total — used purely for diagnostics.
{error, {expr_too_complex, ast_node_count(Ast), ?AST_MAX_NODES}}
end.
%% Budgeted counter. Returns `{ok, Total}` if the whole AST fits within
%% `?AST_MAX_NODES`, otherwise `over_limit` at the first node that blows
%% the budget.
-spec count_nodes_bounded(ast(), non_neg_integer()) ->
{ok, non_neg_integer()} | over_limit.
count_nodes_bounded(_Ast, Acc) when Acc > ?AST_MAX_NODES ->
over_limit;
count_nodes_bounded(N, Acc) when is_integer(N) ->
{ok, Acc + 1};
count_nodes_bounded(n, Acc) ->
{ok, Acc + 1};
count_nodes_bounded({unop, '!', E}, Acc) ->
count_nodes_bounded(E, Acc + 1);
count_nodes_bounded({binop, _Op, L, R}, Acc) ->
case count_nodes_bounded(L, Acc + 1) of
over_limit -> over_limit;
{ok, Acc1} -> count_nodes_bounded(R, Acc1)
end;
count_nodes_bounded({ternary, C, T, E}, Acc) ->
case count_nodes_bounded(C, Acc + 1) of
over_limit ->
over_limit;
{ok, Acc1} ->
case count_nodes_bounded(T, Acc1) of
over_limit -> over_limit;
{ok, Acc2} -> count_nodes_bounded(E, Acc2)
end
end.
%% Exact, total node count — used only on the error branch for diagnostic
%% reporting (`{expr_too_complex, Nodes, Max}`).
-spec ast_node_count(ast()) -> pos_integer().
ast_node_count(N) when is_integer(N) ->
1;
ast_node_count(n) ->
1;
ast_node_count({unop, '!', E}) ->
1 + ast_node_count(E);
ast_node_count({binop, _Op, L, R}) ->
1 + ast_node_count(L) + ast_node_count(R);
ast_node_count({ternary, C, T, E}) ->
1 + ast_node_count(C) + ast_node_count(T) + ast_node_count(E).
%% =========================
%% CLDR canonical rules
%% =========================
%%
%% Hard-coded subset of the CLDR `plurals.json` data
%% (cldr-json/cldr-core/supplemental/plurals.json in
%% https://github.com/unicode-org/cldr-json,
%% retrieved 2026-05 — see also https://cldr.unicode.org/index/cldr-spec/plural-rules
%% for the rule language). Each row is `{Locale, NPlurals, ExprBin}`
%% where `ExprBin` is the C-style plural expression that, when paired
%% with `nplurals=NPlurals`, produces the canonical CLDR rule.
%%
%% Region-tagged locales (e.g. `pt_BR`) are included where CLDR
%% diverges from the base language (e.g. `pt` is European Portuguese
%% with `n != n` (sic — n!=0 && n!=1; this codifies the simple
%% historical `n > 1`), while `pt_BR` matches the legacy `n > 1`).
%% Locales not listed fall back to the base language tag via
%% `cldr_rule/1`.
%%
%% Strategy (Option A — hard-coded):
%% Pros: zero deps, byte-equal control over what ships, easy to audit.
%% Cons: requires manual sync on each CLDR release.
%%
%% Generating the table from upstream CLDR JSON (Option C) was considered
%% and not adopted: the inline literal keeps this module's data surface
%% small, dependency-free, and reviewable.
-doc """
Embedded CLDR table — the only TRUSTED data source (a static literal) of
this module.
The rows between the `BEGIN/END GENERATED CLDR TABLE` markers are
GENERATED by `bin/gen-plural-table.escript` from the committed seed
`apps/erli18n/priv/gettext/plural_forms.eterm`. On a CLDR release sync,
edit the seed and re-run the generator (`escript
bin/gen-plural-table.escript`) instead of editing the rows by hand.
Each row is `{Locale, NPlurals, ExprBin}`, where `ExprBin` paired with
`nplurals=NPlurals` reproduces that locale's CLDR canonical rule. Region
rows only exist where they diverge from the base language (e.g. `pt_PT` =
`n != 1` against the base `pt` = `n > 1`); the rest falls back to the base
via `cldr_rule/1`.
For the maintainer: each expression MUST be valid for `compile/1`. The
generator validates this when the compiled beam is reachable, and
`build_cldr_compiled_table/0` calls `compile/1` on each row when building the
cache and accepts only `{ok, _}`. A row that fails to compile is a defect in
this trusted static literal and crashes the build loudly with `case_clause`,
rather than silently degrading to "no CLDR entry"; review edits carefully.
""".
cldr_data() ->
%% BEGIN GENERATED CLDR TABLE
%% Generated by bin/gen-plural-table.escript from
%% apps/erli18n/priv/gettext/plural_forms.eterm. Do not edit by hand;
%% edit the seed table and re-run the generator.
[
{<<"ar">>, 6, <<
"n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5"
>>},
{<<"bg">>, 2, <<"n != 1">>},
{<<"cs">>, 3, <<"(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2">>},
{<<"da">>, 2, <<"n != 1">>},
{<<"de">>, 2, <<"n != 1">>},
{<<"de_AT">>, 2, <<"n != 1">>},
{<<"de_CH">>, 2, <<"n != 1">>},
{<<"el">>, 2, <<"n != 1">>},
{<<"en">>, 2, <<"n != 1">>},
{<<"en_GB">>, 2, <<"n != 1">>},
{<<"en_US">>, 2, <<"n != 1">>},
{<<"es">>, 2, <<"n != 1">>},
{<<"es_ES">>, 2, <<"n != 1">>},
{<<"es_MX">>, 2, <<"n != 1">>},
{<<"et">>, 2, <<"n != 1">>},
{<<"fa">>, 2, <<"n != 1">>},
{<<"fi">>, 2, <<"n != 1">>},
{<<"fr">>, 2, <<"n > 1">>},
{<<"fr_CA">>, 2, <<"n > 1">>},
{<<"fr_FR">>, 2, <<"n > 1">>},
{<<"he">>, 2, <<"n != 1">>},
{<<"hi">>, 2, <<"n != 1">>},
{<<"hr">>, 3, <<
"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
>>},
{<<"hu">>, 2, <<"n != 1">>},
{<<"it">>, 2, <<"n != 1">>},
{<<"ja">>, 1, <<"0">>},
{<<"ko">>, 1, <<"0">>},
{<<"nb">>, 2, <<"n != 1">>},
{<<"nl">>, 2, <<"n != 1">>},
{<<"nn">>, 2, <<"n != 1">>},
{<<"no">>, 2, <<"n != 1">>},
{<<"pl">>, 3, <<"n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>},
{<<"pt">>, 2, <<"n > 1">>},
{<<"pt_BR">>, 2, <<"n > 1">>},
{<<"pt_PT">>, 2, <<"n != 1">>},
{<<"ro">>, 3, <<"n==1 ? 0 : (n==0 || (n%100>0 && n%100<20)) ? 1 : 2">>},
{<<"ru">>, 3, <<
"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
>>},
{<<"sk">>, 3, <<"(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2">>},
{<<"sl">>, 4, <<"n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3">>},
{<<"sr">>, 3, <<
"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
>>},
{<<"sv">>, 2, <<"n != 1">>},
{<<"th">>, 1, <<"0">>},
{<<"tr">>, 2, <<"n != 1">>},
{<<"uk">>, 3, <<
"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
>>},
{<<"vi">>, 1, <<"0">>},
{<<"zh">>, 1, <<"0">>},
{<<"zh_CN">>, 1, <<"0">>},
{<<"zh_HK">>, 1, <<"0">>},
{<<"zh_TW">>, 1, <<"0">>}
].
%% END GENERATED CLDR TABLE
lookup_locale(Locale) ->
lookup_locale(Locale, cldr_data()).
lookup_locale(_Locale, []) ->
undefined;
lookup_locale(Locale, [{Locale, N, Expr} | _]) ->
{ok, N, Expr};
lookup_locale(Locale, [_ | Rest]) ->
lookup_locale(Locale, Rest).
%% Strip the region tag from a locale (`pt_BR` -> `pt`, `zh-Hant` ->
%% `zh`). Accepts both `_` and `-` as separators per BCP47 leniency.
base_locale(Locale) ->
case binary:match(Locale, [~"_", ~"-"]) of
nomatch -> Locale;
{Pos, _Len} -> binary:part(Locale, 0, Pos)
end.
%% =========================
%% CLDR equivalence check (pre-compiled)
%% =========================
%%
%% Divergence is checked structurally on the parsed ASTs (nplurals + expr
%% AST), so it is whitespace and paren-noise insensitive — `(n != 1)`
%% matches `n != 1`. Finding #17: the loader already compiled the header
%% AST, so `validate_against_cldr_ast/2` reuses it; the CLDR side is taken
%% from a one-time MEMOISED table of compiled bundles (`cldr_compiled/1`)
%% instead of re-parsing the canonical rule on every load. This removes
%% the second header compile (and the per-load CLDR synthesise+compile +
%% linear scans) that the old `ast_equivalent/split_rule` path incurred,
%% and decouples divergence from the plural-compile O(n^2) bug (a
%% pathological header is compiled once, not twice).
%% persistent_term key for the memoised CLDR AST table. The table is a
%% constant (a fixed set of static-literal rows) so a single global cache is
%% sound: it is content-addressed by the module and never invalidated.
-define(CLDR_COMPILED_KEY, {?MODULE, cldr_compiled_table}).
%% Compiled CLDR bundle for a locale, with region fallback identical to
%% `cldr_rule/1` (`fr_BE` -> `fr`). Returns the same `plural_compiled()`
%% shape as `compile/1` so the AST can be compared directly; `raw` carries
%% the CLDR canonical EXPRESSION binary (matching the old
%% `validate_against_cldr/2` warning payload, which used `cldr_rule/1`'s
%% expr). `undefined` when neither the locale nor its base is in the table.
-spec cldr_compiled(binary()) -> plural_compiled() | undefined.
cldr_compiled(Locale) when is_binary(Locale) ->
Table = cldr_compiled_table(),
case Table of
#{Locale := Bundle} ->
Bundle;
#{} ->
case base_locale(Locale) of
Locale ->
undefined;
Base ->
maps:get(Base, Table, undefined)
end
end.
-doc """
Memoised `locale => compiled bundle` map — the ONLY side effect of the
module, built exactly once per node and cached in `persistent_term`.
For the maintainer: the cache is a module-scoped singleton, written under
the fixed key `?CLDR_COMPILED_KEY` (the tuple
`{?MODULE, cldr_compiled_table}`, NOT a content hash) and NEVER
invalidated — `cldr_data/0` is a constant literal, so a global cache is
safe. The first call builds it via `build_cldr_compiled_table/0` and
writes it; the following ones hit the cache (the cast on the hit branch only
re-announces the `term()` from `persistent_term:get/2`, since the only writer
is the clause above). It is
only for the cold divergence path (`cldr_compiled/1`); it is never touched
by `evaluate/2`.
""".
%% Return the memoised locale -> compiled-bundle map, building it exactly
%% once per node and caching it in `persistent_term`. `cldr_data/0` is a
%% static, trusted constant whose every expression is a canonical CLDR
%% rule, so each `compile/1` here is guaranteed to succeed; a malformed
%% row would be a build-time defect in this module and is surfaced
%% immediately (the bad row is simply dropped from the table, so it falls
%% back to "no CLDR entry" rather than crashing the loader).
-spec cldr_compiled_table() -> #{binary() => plural_compiled()}.
cldr_compiled_table() ->
case persistent_term:get(?CLDR_COMPILED_KEY, undefined) of
undefined ->
Table = build_cldr_compiled_table(),
persistent_term:put(?CLDR_COMPILED_KEY, Table),
Table;
Table when is_map(Table) ->
%% `persistent_term:get/2` is typed `term()`; the `is_map/1` guard
%% narrows it to a map (the only writer is the clause above, so this
%% is the cache hit), which eqwalizer accepts here — no cast needed.
Table
end.
-spec build_cldr_compiled_table() -> #{binary() => plural_compiled()}.
build_cldr_compiled_table() ->
lists:foldl(
fun({Locale, N, Expr}, Acc) ->
Header = <<
"nplurals=",
(integer_to_binary(N))/binary,
"; plural=",
Expr/binary,
";"
>>,
%% `cldr_data/0` is a trusted static literal whose every row
%% compiles; a row that failed would be a defect and crashes
%% explicitly at table build (`case_clause`) rather than silently
%% degrading to "no CLDR entry".
case compile(Header) of
{ok, #{nplurals := NC, expr := Ast}} ->
%% Store the raw CLDR EXPRESSION (not the synthesised
%% header) as `raw`, to match the legacy warning
%% payload that surfaced `cldr_rule/1`'s expr.
Acc#{Locale => #{nplurals => NC, expr => Ast, raw => Expr}}
end
end,
#{},
cldr_data()
).