Skip to main content

src/erli18n_plural.erl

-module(erli18n_plural).

-moduledoc """
Evaluator and validator for the gettext/CLDR plural rules used by
erli18n.

Compiles the C expression from a `.po` `Plural-Forms:` header
(`nplurals=N; plural=EXPR;`) into a small AST and evaluates it to choose
the plural form for a given N — this is what backs `ngettext`/`npgettext`.

## The problem it solves

gettext selects the correct plural translation by evaluating a C
EXPRESSION embedded in the `.po` header (e.g. the Russian 3-form rule
`n%10==1 && n%100!=11 ? 0 : ...`). Each locale ships its own. This module
replaces the legacy `gettexter`'s Yecc/Leex/`erl_eval` pipeline with a
hand-written recursive-descent parser + AST interpreter (no dynamic
generation of Erlang code, so dialyzer/eqwalizer can reason about
everything). It turns `EXPR` into a `t:ast/0` and evaluates it to a form
index in `[0, NPlurals)`.

## Mental model

- **Two phases.** `compile/1` (load-time, cold) parses + validates +
  packs into a `t:plural_compiled/0`. `evaluate/2` (lookup-time, HOT
  PATH) interprets that bundle per call. The catalog loader compiles
  ONCE and keeps the bundle; each `ngettext`/`npgettext` calls only
  `evaluate/2`.
- **Runtime source-of-truth is the `.po` header** (PSD-004). The embedded
  CLDR table (`cldr_rule/1`, one row per GNU gettext / CLDR locale) does NOT take part in the hot
  path: it is consulted only at load time to emit divergence warnings
  (`validate_against_cldr/2`) and as a fallback when the header is missing
  (`fallback_rule/0`).
- **Trusted vs untrusted.** The header expression comes from a tenant's
  `.po` — UNTRUSTED input (ADR-0003, see `SECURITY.md`). The
  `cldr_data/0` table is a static module literal — TRUSTED. That is why
  `compile/1` is fail-closed and hardened, while `cldr_compiled_table/0`
  assumes every row compiles.
- **Pure function, no per-process state.** Unlike the catalog server, this
  module has no gen_server, no ETS and no process dictionary. The only
  side effect is a global read-once cache in `persistent_term`
  (`cldr_compiled_table/0`), memoising the compiled CLDR ASTs — a
  module-scoped singleton under a fixed key, built once per node and never
  invalidated (`cldr_data/0` is constant).

## Anti-DoS hardening (ADR-0003)

The attack surface is the `.po` expression. The defenses ALL live in
`compile/1` (cold), so that `evaluate/2` (hot) stays O(1)-bounded by
construction:

- `?PLURAL_EXPR_MAX_BYTES` (2048) — rejects a long expression before parse.
- `?PLURAL_EXPR_MAX_DEPTH` (64) — bounds nesting (and the walker's stack).
- `?AST_MAX_NODES` (256) — bounds the node count (a wide flat chain
  `n*n*...*n` passes both caps above but would grow an `n^k` bignum per
  lookup).
- `?MAX_INT_DIGITS` (7) — bounds the digits of `nplurals=` before
  `binary_to_integer` materialises the bignum.
- Static rejection (`validate_safe/2`) — refuses rules provably faulty
  for EVERY N (div/mod by a constant divisor of 0; constant outside
  `[0, NPlurals)`).

`evaluate/2` is TOTAL: it never raises. Mirroring the GNU libintl runtime
(`dcigettext.c`), division/modulo by zero is coerced to 0 and a form
outside `[0, NPlurals)` is clamped to 0. Anyone who needs to OBSERVE the
anomaly (log/alert) uses `evaluate_checked/2`, which returns it as data.

## When you touch this module

- **Consumer:** almost never directly — you call `erli18n:ngettext/5` and
  the catalog server takes care of `compile/1`/`evaluate/2`. For a quick
  test outside the server, `plural_by_po_header/2` compiles and evaluates
  in one step.
- **Loader maintainer:** calls `compile/1` at load, keeps the bundle, and
  on the hot path calls `evaluate/2`. For CLDR divergence at load use
  `validate_against_cldr_ast/2` (reuses the already-compiled AST).
- **CLDR table maintainer:** edits `cldr_data/0` when syncing a CLDR
  release.

## Quickstart

```erlang
%% Compile the Russian 3-form (one/few/many) rule once...
1> Hdr = <<"nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : "
1>        "n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2;">>.
2> {ok, C} = erli18n_plural:compile(Hdr).
{ok,#{raw => Hdr,expr => {ternary,_,_,_},nplurals => 3}}
%% ...and select the form for various N (hot path).
3> erli18n_plural:evaluate(C, 1).
0
4> erli18n_plural:evaluate(C, 2).
1
5> erli18n_plural:evaluate(C, 5).
2
%% One-off use: compile and evaluate at once.
6> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok, 0}
```

## Key functions

- `compile/1` — parse + fail-closed validation → `t:plural_compiled/0`.
- `evaluate/2` — hot path, total, returns the form index.
- `evaluate_checked/2` — structured sibling that reports anomalies as data.
- `plural_by_po_header/2` — compile+evaluate shortcut for one-off use.
- `cldr_rule/1` / `validate_against_cldr/2` / `validate_against_cldr_ast/2`
  — CLDR observability (off the hot path).
- `fallback_rule/0` — Germanic default when the header is missing.
""".

%% Evaluator for the GNU gettext `Plural-Forms:` header C-expression
%% (per https://www.gnu.org/software/gettext/manual/gettext.html#Translating-plural-forms)
%% and CLDR-canonical-rule validator
%% (per https://cldr.unicode.org/index/cldr-spec/plural-rules,
%% source data: cldr-json/cldr-core/supplemental/plurals.json in
%% https://github.com/unicode-org/cldr-json).
%%
%% Design source-of-truth:
%%
%%   * PSD-004 (po_semantics_decisions.md) — the `.po` `Plural-Forms` header
%%     is the runtime source-of-truth; CLDR is consulted only at load-time
%%     for divergence warnings, and as fallback when the header is absent.
%%     Therefore `evaluate/2` is the **hot path** and must never touch CLDR.
%%
%%   * PSD-008 (po_semantics_decisions.md) — degenerate plural rules
%%     (`nplurals=1; plural=0;`, used by ja/zh/ko/vi/th) must round-trip
%%     through compile/evaluate as a literal integer expression. The
%%     grammar therefore accepts integer literals as valid primary terms.
%%
%%   * BR-DESCARTAR-003 (discard_log.md) — the GNU Plural-Forms evaluation
%%     capability is preserved from the legacy `gettexter_plural` module.
%%     The Yecc/Leex/erl_syntax/erl_eval pipeline (~231 LOC) is dropped,
%%     but the C-truthy operator semantics and recursive walker shape are
%%     refactored here into a single recursive-descent parser + interpreter.
%%
%%   * paradigm_decision.md §E3 — hybrid wrapper: local recursive-descent
%%     evaluator in the hot path; CLDR table only out of the hot path.
%%
%% Implementation notes:
%%
%%   * No Yecc, no Leex, no dynamic Erlang code generation — the evaluator
%%     interprets a small AST so dialyzer can reason about everything.
%%   * Operators follow C precedence/associativity. Short-circuit semantics
%%     are honoured for `&&` and `||` so that expressions guarded against
%%     division by zero (e.g. `n != 0 && (10/n) > 1`) behave as in C.
%%   * Modulo (`%`) uses Erlang `rem`, which matches C99 truncation toward
%%     zero — the only behaviour `.po` plural rules ever rely on.
%%   * Division by zero in untrusted `.po` input is handled, not
%%     propagated (finding #1, plural-eval-throws-per-lookup-dos):
%%     `evaluate/2` is TOTAL on the per-request hot path. A zero divisor
%%     is pinned to 0 (`eval_div/2` / `eval_rem/2`) and an out-of-range
%%     form is clamped to 0, matching GNU libintl's `dcigettext.c`
%%     instead of raising `badarith`. Statically-faulty rules are
%%     rejected up front by `compile/1`; `evaluate_checked/2` surfaces
%%     the anomaly as data for callers that want to observe it.
%%   * CLDR data ships inline (one row per GNU gettext / CLDR locale, see `cldr_rule/1`), but the
%%     `cldr_data/0` rows are no longer maintained by hand: they are
%%     generated between the `BEGIN/END GENERATED CLDR TABLE` markers by
%%     `bin/gen-plural-table.escript` from the committed seed table
%%     `apps/erli18n/priv/gettext/plural_forms.eterm`. The inline literal
%%     is still what the runtime reads (small, dependency-free, reviewable
%%     data surface); the seed + generator give a single source of truth
%%     and a diffable target for `bin/extract-gettext-table.sh`, which
%%     produces the same `{Locale, NPlurals, PluralExpr}` shape from the
%%     real GNU gettext toolchain so drift can be detected on a CLDR
%%     release sync. The alternatives stay rejected:
%%
%%       - Option B: external hex dep (e.g. ex_cldr). Heavyweight, pulls
%%         Elixir interop, not justified for a single-table lookup.
%%       - Option C: parsing upstream CLDR JSON at build time. The seed
%%         eterm keeps the shipped data surface small and reviewable
%%         without a JSON toolchain in the build.

%% Public API.
-export([
    compile/1,
    evaluate/2,
    evaluate_checked/2,
    plural_by_po_header/2,
    cldr_rule/1,
    validate_against_cldr/2,
    validate_against_cldr_ast/2,
    fallback_rule/0
]).

-export_type([
    plural_compiled/0,
    compile_error/0,
    plural_eval_error/0,
    ast/0,
    op/0
]).

%% =========================
%% Types
%% =========================

-doc """
Compiled plural-rule bundle — the output of `compile/1` and the input to
`evaluate/2`/`evaluate_checked/2`.

- `nplurals` — how many plural forms the locale has (validated in
  `[1, ?NPLURALS_MAX]`); every returned index stays in `[0, nplurals)`.
- `expr` — the parsed `t:ast/0` of the `plural=` expression, evaluated on
  the hot path.
- `raw` — the originating raw header, preserved for diagnostics and for
  the divergence payload of `validate_against_cldr_ast/2`.

Compile once at load and reuse this map on every lookup; there is no
result cache inside `evaluate/2`.
""".
-type plural_compiled() :: #{
    nplurals := pos_integer(),
    expr := ast(),
    raw := binary()
}.

-doc """
Structural failure reason from `compile/1` — always fail-closed, never an
exception.

Groups header defects (`missing_nplurals`, `missing_plural_expr`,
`nplurals_out_of_range`, `syntax_error`) and the anti-DoS hardening
rejections: `expr_too_long`/`expr_too_deep`/`expr_too_complex`
(byte/depth/node caps), `nplurals_too_many_digits` (digit cap before the
bignum) and `unsafe_plural_rule` (rule statically faulty for every N).
See `compile/1` for what triggers each one.
""".
-type compile_error() ::
    {syntax_error, Reason :: term(), Position :: non_neg_integer()}
    | {missing_nplurals, binary()}
    | {missing_plural_expr, binary()}
    | {nplurals_out_of_range, integer()}
    %% Layer 3 (finding #1): a rule that is STATICALLY guaranteed to
    %% fault — a literal division/modulo by zero, or a constant form
    %% index provably outside [0, NPlurals) — is rejected at load time
    %% so the poisoned catalog is refused by `ensure_loaded` rather than
    %% loading as `{ok, _}` and crashing every later lookup.
    | {unsafe_plural_rule, plural_eval_error()}
    %% Finding #2 (plural-compile-superlinear-unbounded): the parser
    %% runs on untrusted `.po` input inside the catalog gen_server's
    %% `handle_call`. An expression longer than `?PLURAL_EXPR_MAX_BYTES`
    %% or nested deeper than `?PLURAL_EXPR_MAX_DEPTH` is rejected
    %% fail-closed so a pathological-but-valid rule cannot make compile
    %% superlinear/unbounded and freeze the server.
    | {expr_too_long, Size :: non_neg_integer(), Max :: pos_integer()}
    | {expr_too_deep, Depth :: pos_integer(), Position :: non_neg_integer()}
    %% Finding #9 (plural-bignum-cpu-dos-evaluate-hotpath): the byte and
    %% depth caps above do not bound the AST NODE COUNT, so a wide flat
    %% operator chain (`n*n*...*n`) can still compile to thousands of
    %% nodes that `evaluate/2` walks — growing an `n^k` bignum — on every
    %% lookup. An AST above `?AST_MAX_NODES` is rejected fail-closed so
    %% the per-lookup cost stays O(1)-bounded by construction.
    | {expr_too_complex, Nodes :: pos_integer(), Max :: pos_integer()}
    %% Finding #8 (po-plural-unbounded-binary-to-integer-bignum): the
    %% `nplurals=<digits>` run is capped by DIGIT COUNT before any
    %% `binary_to_integer` materialises the bignum. The rejected value is
    %% deliberately kept OUT of the payload (only the digit count and the
    %% cap are reported) so a thousands-digit adversarial run cannot
    %% amplify memory/logs, and the >=~1.3M-digit `system_limit` path is
    %% never reached.
    | {nplurals_too_many_digits, Digits :: pos_integer(), Max :: pos_integer()}.

-doc """
Anomaly observed while evaluating a compiled rule — returned as data,
never raised.

`{division_by_zero, '/' | '%'}` when an evaluated divisor is 0;
`{form_out_of_range, Form, NPlurals}` when the index falls outside
`[0, NPlurals)`. It appears as a return of `evaluate_checked/2` and as the
payload of an `{unsafe_plural_rule, _}` rejected by `compile/1`. The total
`evaluate/2` NEVER produces this — it clamps (parity with libintl).
""".
-type plural_eval_error() ::
    {division_by_zero, '/' | '%'}
    | {form_out_of_range, Form :: integer(), NPlurals :: pos_integer()}.

-doc """
AST of the plural expression — a literal integer, the variable `n`, a
binop (`{binop, t:op/0, Left, Right}`), the negation unop
(`{unop, '!', _}`) or a ternary (`{ternary, Cond, Then, Else}`).

It is the tree that `compile/1` builds and that `evaluate/2`/`eval_ast/2`
interpret. The depth is bounded by `?PLURAL_EXPR_MAX_DEPTH` and the node
count by `?AST_MAX_NODES`, so no valid instance is arbitrarily large.
""".
-type ast() ::
    integer()
    %% variable n
    | n
    | {binop, op(), ast(), ast()}
    | {unop, '!', ast()}
    | {ternary, ast(), ast(), ast()}.

-doc """
Binary operators accepted in a `t:ast/0`, with C precedence/associativity:
arithmetic (`+ - * / %`), relational (`< > <= >=`), equality (`== !=`) and
short-circuit logical (`&&` `||`). `%` uses `rem` (truncates toward zero,
like C99); `/` and `%` by zero are coerced to 0 on the hot path.
""".
-type op() ::
    '+'
    | '-'
    | '*'
    | '/'
    | '%'
    | '=='
    | '!='
    | '<'
    | '>'
    | '<='
    | '>='
    | '&&'
    | '||'.

%% Internal parser state — carries the remaining input and absolute byte
%% offset (for surfacing diagnostic positions in syntax errors).
-record(ps, {
    src :: binary(),
    pos = 0 :: non_neg_integer()
}).

%% Sanity bound for nplurals. Real-world locales top out at 6 (Arabic).
%% Any header declaring more than a thousand forms is malformed input.
-define(NPLURALS_MAX, 1000).

%% Maximum number of decimal digits accepted for the `nplurals=<digits>`
%% field (finding #8, po-plural-unbounded-binary-to-integer-bignum). The
%% range check is `[1, ?NPLURALS_MAX=1000]`, so 4 digits already covers
%% every legal value; 7 leaves generous headroom for realistic indices
%% while keeping the bignum tiny. Capping by digit COUNT *before*
%% `binary_to_integer` means a thousands-digit adversarial run is
%% rejected in O(1) without ever materialising an O(d^2) bignum or
%% reaching the >=~1.3M-digit `error:system_limit` path.
-define(MAX_INT_DIGITS, 7).

%% Bounds for the `Plural-Forms` expression itself (finding #2,
%% plural-compile-superlinear-unbounded). `?NPLURALS_MAX` bounds the
%% form COUNT, not the expression SIZE, so without these the parser is
%% unbounded in both byte-length and recursion depth on untrusted input.
%%
%%   * `?PLURAL_EXPR_MAX_BYTES` — the real-world most-complex rule
%%     (Arabic) is ~98 bytes; 2048 is ~20x headroom, so no legitimate
%%     catalog is affected, while a multi-KB adversarial expression is
%%     rejected before it can be parsed.
%%   * `?PLURAL_EXPR_MAX_DEPTH` — Arabic's nesting depth is well under
%%     10; 64 is ~6x headroom and also bounds the recursion depth of the
%%     hot-path `eval_ast/2` walker (stack growth per lookup).
-define(PLURAL_EXPR_MAX_BYTES, 2048).
-define(PLURAL_EXPR_MAX_DEPTH, 64).

%% Bound on the number of nodes in the compiled plural AST (finding #9,
%% plural-bignum-cpu-dos-evaluate-hotpath). Complements the byte/depth
%% caps above, which do NOT bound the node count: a wide, flat operator
%% chain (`n*n*...*n`) stays under both — it is left-associative, so it
%% does not nest the parser, and ~1000 factors fit inside 2048 bytes —
%% yet it compiles to ~2000 AST nodes. `evaluate/2` walks that whole tree
%% (and grows an `n^k` bignum) on EVERY ngettext lookup, with no result
%% cache, so the per-lookup cost is super-linear in the chain length and
%% grows with N. Bounding the node count at compile time keeps the
%% installed AST small, so `evaluate/2`'s cost is O(1)-bounded by
%% construction. The real-world most-complex rule (Russian/Arabic) has
%% ~39 nodes; 256 is ~6.5x headroom, so no legitimate catalog is
%% affected, while a pathological chain is rejected before it can poison
%% every later evaluation.
-define(AST_MAX_NODES, 256).

%% Identifier-character predicate, used to reject malformed bare words
%% like `nx`. Macro so the parser inlines the test in a guard.
-define(IS_IDENT(C),
    ((C >= $a andalso C =< $z) orelse
        (C >= $A andalso C =< $Z) orelse
        (C >= $0 andalso C =< $9) orelse
        C =:= $_)
).

%% =========================
%% Public API
%% =========================

-doc """
Compiles a `.po` `Plural-Forms:` header expression into a
`plural_compiled()` bundle (a `nplurals`/`expr`/`raw` map) reused by each
`evaluate/2`.

`Header` is the header string (`nplurals=N; plural=EXPR;`); the fields are
located in a whitespace-tolerant way. Returns `{ok, Compiled}` or
`{error, compile_error()}`, always fail-closed (never raises), since it
runs over untrusted `.po` inside the gen_server's `handle_call`.

Relevant structural rejections:
- `{expr_too_long, Size, Max}` — expression above `?PLURAL_EXPR_MAX_BYTES`
  (2048), refused before parsing;
- `{expr_too_deep, Depth, Pos}` — nesting above `?PLURAL_EXPR_MAX_DEPTH`
  (64);
- `{expr_too_complex, Nodes, Max}` — AST with more nodes than
  `?AST_MAX_NODES` (256), barring wide flat chains (`n*n*...*n`) that
  would grow a bignum per lookup;
- `{unsafe_plural_rule, Reason}` — STATICALLY faulty rule: division/modulo
  by a constant divisor of 0, or a constant rule whose form falls outside
  `[0, NPlurals)`. Cases that fail only for a specific N are left to the
  dynamic clamp of `evaluate/2`;
- `{nplurals_too_many_digits, _, _}`, `{nplurals_out_of_range, _}`,
  `{missing_nplurals, _}`, `{missing_plural_expr, _}` and `{syntax_error,
  Reason, Pos}` for the remaining header defects.

Edge cases: redundant parentheses and whitespace are absorbed by the
parser; `n` is the ONLY allowed identifier (`nx` or `m` become a
`syntax_error`); degenerate rules `plural=0` (ja/zh/ko/vi/th) compile as
an integer literal (PSD-008). A rule that fails only for a specific N
(e.g. `n/(n-5)`) is NOT rejected here — that is left to the dynamic clamp
of `evaluate/2`.

```erlang
1> erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
{ok,#{raw => <<"nplurals=2; plural=n != 1;">>,expr => {binop,'!=',n,1},nplurals => 2}}
2> erli18n_plural:compile(<<"nplurals=1; plural=0;">>).
{ok,#{raw => <<"nplurals=1; plural=0;">>,expr => 0,nplurals => 1}}
3> erli18n_plural:compile(<<"nplurals=2; plural=n/0;">>).
{error,{unsafe_plural_rule,{division_by_zero,'/'}}}
4> erli18n_plural:compile(<<"nplurals=2; plural=nx;">>).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
5> erli18n_plural:compile(<<"nplurals=2;">>).
{error,{missing_plural_expr,<<"nplurals=2;">>}}
```

See also `evaluate/2` (consume the bundle), `plural_by_po_header/2`
(compile+evaluate at once) and `t:compile_error/0`.
""".
-spec compile(binary()) -> {ok, plural_compiled()} | {error, compile_error()}.
compile(Header) when is_binary(Header) ->
    case extract_nplurals(Header) of
        {ok, NPlurals} ->
            case extract_plural_expr(Header) of
                {ok, ExprBin} ->
                    case parse_expr_bin(ExprBin) of
                        {ok, Ast} ->
                            compile_validated(Header, NPlurals, Ast);
                        {error, _} = Err ->
                            Err
                    end;
                {error, _} = Err ->
                    Err
            end;
        {error, _} = Err ->
            Err
    end.

%% Apply the two load-time validation barriers to a successfully parsed
%% AST and, on success, materialise the `plural_compiled()` bundle.
%%
%%   * Layer 3 (finding #1): reject rules that are STATICALLY guaranteed
%%     to fault (literal div/mod by zero, constant out-of-range form)
%%     before they can be stored and crash every later lookup.
%%   * Node-count cap (finding #9): reject an AST above `?AST_MAX_NODES`
%%     so a wide flat chain cannot make `evaluate/2` walk thousands of
%%     nodes (and grow a large bignum) on every lookup. Run once here at
%%     load time, never on the hot path.
-spec compile_validated(binary(), pos_integer(), ast()) ->
    {ok, plural_compiled()} | {error, compile_error()}.
compile_validated(Header, NPlurals, Ast) ->
    case validate_safe(Ast, NPlurals) of
        ok ->
            case check_ast_complexity(Ast) of
                ok ->
                    {ok, #{
                        nplurals => NPlurals,
                        expr => Ast,
                        raw => Header
                    }};
                {error, _} = Err ->
                    Err
            end;
        {error, EvalErr} ->
            {error, {unsafe_plural_rule, EvalErr}}
    end.

%% Evaluate a compiled plural rule for a particular N. Pure function on
%% the hot path — no allocations beyond the return value. Negative N is
%% accepted; the C runtime in libintl applies abs() on the integer, but
%% gettext .po rules are all defined over non-negative N. We pass N
%% through unchanged so the rule's own semantics decide.
%%
%% TOTALITY (finding #1, plural-eval-throws-per-lookup-dos). `.po` input
%% is untrusted (ADR-0003) and this function runs in the CALLER process
%% on every `ngettext`/`npgettext` lookup, so it MUST NOT raise. Two
%% failure modes that a malformed rule could otherwise trigger are
%% neutralised here, matching the GNU libintl runtime:
%%
%%   * division / modulo by zero — `eval_div/2` and `eval_rem/2` coerce
%%     a zero divisor to a defined value (C undefined behaviour pinned to
%%     0) instead of letting Erlang `div`/`rem` raise `badarith`.
%%   * out-of-range form index — clamped to form 0, exactly as
%%     `dcigettext.c` (`plural_lookup`) does: `if (index >= nplurals)
%%     index = 0;` ("this should never happen" -> clamp, NOT crash).
%%
%% The `-spec` is therefore HONEST: the result is provably
%% `non_neg_integer()` for every N and every AST. Callers that want to
%% OBSERVE the anomaly as data use `evaluate_checked/2` instead.
-doc """
Evaluates a compiled plural rule for a given `N` and returns the plural
form index — the TOTAL hot-path function, used by every
`ngettext`/`npgettext`.

`Compiled` is the bundle from `compile/1`; `N` is the count (an integer,
may be negative — the rule decides the semantics). The return is always a
`non_neg_integer()` in `[0, NPlurals)`: the rule is interpreted and the
result coerced to an integer.

Never raises, even on a malformed rule (parity with GNU libintl):
division/modulo by zero is coerced to 0 (`eval_div/2`/`eval_rem/2` instead
of letting `div`/`rem` raise `badarith`) and a form outside
`[0, NPlurals)` is clamped to 0 (`if index >= nplurals -> index = 0`).

No allocations beyond the return value and no result cache: the cost is
re-paying the AST interpretation on every call — which is why the
`compile/1` caps keep the AST small. A negative `N` is passed through
without `abs()`; the rule decides the semantics (and the clamp protects
the result).

```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate(C, 1).
0
3> erli18n_plural:evaluate(C, 5).
1
%% Divisor DEPENDS on n (passes compile/1's static check),
%% but evaluates to zero at runtime for N=7: clamp to 0, no crash.
4> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
5> erli18n_plural:evaluate(Bad, 7).
0
```

Edge cases: the short-circuit of `&&`/`||` is honoured, so a zero divisor
behind a false branch is never reached. To OBSERVE the anomaly (instead of
silent clamping) use `evaluate_checked/2`. See also `compile/1` and
`plural_by_po_header/2`.
""".
-spec evaluate(plural_compiled(), integer()) -> non_neg_integer().
evaluate(#{nplurals := NPlurals, expr := Ast}, N) when is_integer(N) ->
    Form = to_integer(eval_ast(Ast, N)),
    clamp_form(Form, NPlurals).

-doc """
Structured sibling of `evaluate/2`: instead of clamping silently, it
reports a malformed rule as data so the consumer can log/alert.

`Compiled` and `N` are as in `evaluate/2`. Returns `{ok, Form}` with the
form in `[0, NPlurals)`, or `{error, plural_eval_error()}`:
`{division_by_zero, '/' | '%'}` when the evaluated divisor is 0, or
`{form_out_of_range, Form, NPlurals}` when the form leaves the range. It
keeps the short-circuit of `&&`/`||` (a zero divisor behind a false branch
is not reported) and, like `evaluate/2`, is total — never raises.

Use this off the hot path, when you want to log/alert the malformed rule;
on the hot path stay with `evaluate/2`, whose clamp is cheaper. Where
`evaluate/2` would return `0` by clamping, this function returns the
corresponding `{error, _}`.

```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:evaluate_checked(C, 5).
{ok,1}
%% Same rule as the evaluate/2 example (divisor depends on n).
%% Where evaluate/2 would clamp to 0, here the anomaly comes back as data.
3> {ok, Bad} = erli18n_plural:compile(<<"nplurals=2; plural=1/(n-7);">>).
4> erli18n_plural:evaluate_checked(Bad, 7).
{error,{division_by_zero,'/'}}
5> erli18n_plural:evaluate_checked(Bad, 8).
{ok,1}
```

Edge cases: a form outside `[0, NPlurals)` (where `evaluate/2` would
clamp) becomes `{error, {form_out_of_range, Form, NPlurals}}`. See also
`evaluate/2` (the sibling that clamps) and `t:plural_eval_error/0`.
""".
-spec evaluate_checked(plural_compiled(), integer()) ->
    {ok, non_neg_integer()} | {error, plural_eval_error()}.
evaluate_checked(#{nplurals := NPlurals, expr := Ast}, N) when is_integer(N) ->
    case eval_ast_checked(Ast, N) of
        {error, _} = Err ->
            Err;
        {ok, Value} ->
            Form = to_integer(Value),
            case Form >= 0 andalso Form < NPlurals of
                true -> {ok, Form};
                false -> {error, {form_out_of_range, Form, NPlurals}}
            end
    end.

%% Clamp a candidate form index into [0, NPlurals) à la libintl. NPlurals
%% is `pos_integer()` (validated at compile), so 0 is always a valid
%% form.
-spec clamp_form(integer(), pos_integer()) -> non_neg_integer().
clamp_form(Form, NPlurals) when Form >= 0, Form < NPlurals ->
    Form;
clamp_form(_Form, _NPlurals) ->
    0.

-doc """
Convenience that compiles and evaluates in a single step: given the raw
header `Header` and the count `N`, returns `{ok, Form}` or propagates the
`{error, compile_error()}` from `compile/1`.

Recompiles on every call, so it is for one-off use; on the hot path, call
`compile/1` once at load and reuse the bundle with `evaluate/2`.

The internal evaluation uses `evaluate/2` (total), so an `{ok, _}` never
embeds an evaluation anomaly — the only part that can fail is `compile/1`,
whose error is propagated as-is.

```erlang
1> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 1).
{ok,0}
2> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=n != 1;">>, 3).
{ok,1}
3> erli18n_plural:plural_by_po_header(<<"nplurals=2; plural=nx;">>, 1).
{error,{syntax_error,{unknown_identifier_after_n,$x},1}}
```

See also `compile/1` and `evaluate/2`.
""".
-spec plural_by_po_header(binary(), integer()) ->
    {ok, non_neg_integer()} | {error, compile_error()}.
plural_by_po_header(Header, N) when is_binary(Header), is_integer(N) ->
    case compile(Header) of
        {ok, Compiled} -> {ok, evaluate(Compiled, N)};
        {error, _} = E -> E
    end.

-doc """
Looks up the CLDR canonical plural expression for `Locale` in the embedded
table.

Returns `{ok, Expr}`, where `Expr` is the binary of the C plural
expression equivalent to that locale's CLDR rule, or `undefined` if
neither the locale nor its base language is in the table. The match is
case-sensitive; region tags fall back to the base language when the region
itself is not listed (e.g. `fr_BE` -> `fr`, since `fr_BE` has no row of
its own in the table).

A lookup/observability function — NOT on the hot path (PSD-004: the `.po`
header is the runtime source-of-truth). The embedded table (`cldr_data/0`)
carries one row per locale the GNU gettext / CLDR seed defines. Both `_` and
`-` separators are accepted in the fallback to the base language.

```erlang
%% Direct hit: the entry exists in the table.
1> erli18n_plural:cldr_rule(<<"ru">>).
{ok,<<"n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>}
%% Direct hit: `fr_CA` HAS its own row, so it resolves without falling back.
2> erli18n_plural:cldr_rule(<<"fr_CA">>).
{ok,<<"n > 1">>}
%% Fallback to the base language: `fr_BE` is not in the table, falls back to `fr`.
3> erli18n_plural:cldr_rule(<<"fr_BE">>).
{ok,<<"n > 1">>}
%% Neither the locale nor the base exists.
4> erli18n_plural:cldr_rule(<<"xx">>).
undefined
```

Edge cases: `pt_PT` (`n != 1`) diverges from the base `pt` (`n > 1`), so
the region entry exists separately. See also `validate_against_cldr/2`
(compare a header against the CLDR rule) and `fallback_rule/0`.
""".
-spec cldr_rule(binary()) -> {ok, binary()} | undefined.
cldr_rule(Locale) when is_binary(Locale) ->
    case lookup_locale(Locale) of
        {ok, _N, Expr} ->
            {ok, Expr};
        undefined ->
            case base_locale(Locale) of
                Locale ->
                    undefined;
                Base ->
                    case lookup_locale(Base) of
                        {ok, _N2, Expr2} -> {ok, Expr2};
                        undefined -> undefined
                    end
            end
    end.

%% Compare a `.po` header expression against the CLDR canonical rule for
%% the given locale. Returns `ok` if the parsed ASTs are structurally
%% identical (whitespace-insensitive) or `{warning, _}` if they diverge
%% in a way that would affect runtime form selection. Per PSD-004 the
%% header always wins at runtime — this only produces observability.
%%
-doc """
Compares the plural expression of header `HeaderRule` (raw form) against
the CLDR canonical rule of `Locale`, producing only observability — at
runtime the header always wins (PSD-004).

Compiles `HeaderRule` ONCE and delegates to `validate_against_cldr_ast/2`.
Returns `ok` when the `(nplurals, expr)` ASTs are structurally equal
(whitespace/paren-insensitive) or when the locale has no CLDR entry;
returns `{warning, {plural_divergence, Locale, HeaderRule, CldrRaw}}` when
they diverge — including when the header is invalid but the locale is
listed in CLDR.

A convenience entry point for callers that only have the raw header. The
catalog loader, which already keeps the compiled bundle, should use
`validate_against_cldr_ast/2` to avoid recompiling the header at load.

The comparison is STRUCTURAL over the `(nplurals, expr-AST)` pair, so it
is insensitive to whitespace and redundant parentheses: `(n != 1)` matches
`n != 1`. Nothing changes at runtime — the warning exists only for
telemetry.

```erlang
%% Header agrees with fr's CLDR (n > 1): no warning.
1> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=(n > 1);">>).
ok
%% Header diverges from fr's CLDR: warning (but the header would win at runtime).
2> erli18n_plural:validate_against_cldr(<<"fr">>, <<"nplurals=2; plural=n != 1;">>).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
%% Locale with no CLDR entry: nothing to validate.
3> erli18n_plural:validate_against_cldr(<<"xx">>, <<"nplurals=2; plural=n != 1;">>).
ok
```

Edge cases: an INVALID header against a locale that IS listed in CLDR
still produces `{warning, _}` (it cannot match the canonical rule);
against a locale with no CLDR entry it becomes `ok`. See also
`validate_against_cldr_ast/2` (variant without recompiling) and
`cldr_rule/1`.
""".
-spec validate_against_cldr(binary(), binary()) ->
    ok
    | {warning, {plural_divergence, binary(), binary(), binary()}}.
validate_against_cldr(Locale, HeaderRule) when
    is_binary(Locale), is_binary(HeaderRule)
->
    case compile(HeaderRule) of
        {ok, Compiled} ->
            validate_against_cldr_ast(Locale, Compiled);
        {error, _} ->
            %% An unparseable header has no AST to compare. Before
            %% finding #17 this still produced a `{warning, _}` for a
            %% CLDR-listed locale (the header could not match the
            %% canonical rule), so preserve that observable behaviour.
            case cldr_compiled(Locale) of
                undefined -> ok;
                #{raw := CldrRaw} -> {warning, {plural_divergence, Locale, HeaderRule, CldrRaw}}
            end
    end.

%% AST-based sibling of `validate_against_cldr/2`. Takes the ALREADY
%% compiled header bundle (`plural_compiled()`) and compares it against
%% the CLDR canonical rule for the locale without recompiling anything:
%%
%%   * the header AST is reused as-is (the loader compiled it once via
%%     `compile/1` and keeps it in the catalog map);
%%   * the CLDR rule is taken from a one-time, memoised table of compiled
%%     ASTs (`cldr_compiled/1`), so no CLDR rule is parsed/synthesised on
%%     the load path either.
%%
%% Equivalence is structural on `(nplurals, expr-AST)` — exactly what the
%% old `ast_equivalent/2` computed, but with both sides already parsed.
%% The warning payload keeps the raw header string (the bundle's `raw`
%% field) and the raw CLDR expression, matching `validate_against_cldr/2`.
-doc """
AST-based variant of `validate_against_cldr/2`: takes the ALREADY compiled
bundle (`plural_compiled()`) and compares it against the CLDR rule of
`Locale` without recompiling anything (finding #17).

Reuses the header AST as-is and takes the CLDR side from a memoised table
of compiled bundles, so no rule is re-parsed at load. Returns `ok` if the
`(nplurals, expr)` pairs match or if the locale has no CLDR entry;
otherwise `{warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}}`,
with the raw header (the bundle's `raw` field) and the raw CLDR
expression.

This is the PREFERRED form in the loader (finding #17): since the bundle
was already compiled by `compile/1` at load, it avoids the second
`compile/1` that `validate_against_cldr/2` would do, and the CLDR side
comes from the `persistent_term` cache (`cldr_compiled_table/0`), not
re-synthesised per load.

```erlang
1> {ok, C} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
2> erli18n_plural:validate_against_cldr_ast(<<"fr">>, C).
{warning,{plural_divergence,<<"fr">>,<<"nplurals=2; plural=n != 1;">>,<<"n > 1">>}}
3> {ok, Cde} = erli18n_plural:compile(<<"nplurals=2; plural=n != 1;">>).
4> erli18n_plural:validate_against_cldr_ast(<<"de">>, Cde).
ok
```

Edge cases: a locale with no CLDR entry becomes `ok` (nothing to log). See
also `validate_against_cldr/2` (from the raw header) and `cldr_compiled/1`
(the memoisation of the CLDR side).
""".
-spec validate_against_cldr_ast(binary(), plural_compiled()) ->
    ok
    | {warning, {plural_divergence, binary(), binary(), binary()}}.
validate_against_cldr_ast(Locale, #{nplurals := NH, expr := EH, raw := HeaderRaw}) when
    is_binary(Locale)
->
    case cldr_compiled(Locale) of
        undefined ->
            %% Locale has no CLDR entry; we cannot validate. Treat as ok
            %% — the loader has nothing meaningful to log.
            ok;
        #{nplurals := NC, expr := EC, raw := CldrRaw} ->
            case NH =:= NC andalso EH =:= EC of
                true -> ok;
                false -> {warning, {plural_divergence, Locale, HeaderRaw, CldrRaw}}
            end
    end.

-doc """
Fallback plural rule used when a `.po` catalog ships no `Plural-Forms:`
header at all (a degenerate but tolerated input).

Returns `<<"nplurals=2; plural=n != 1;">>` — the Germanic C/English
default cited by the GNU gettext manual (§"Plural forms").

A pure constant, no side effects. The result is a raw header ready for
`compile/1`, so the loader's fallback path reuses exactly the same
pipeline as a legitimate header.

```erlang
1> erli18n_plural:fallback_rule().
<<"nplurals=2; plural=n != 1;">>
2> {ok, C} = erli18n_plural:compile(erli18n_plural:fallback_rule()),
2> erli18n_plural:evaluate(C, 1).
0
```

See also `compile/1` and `cldr_rule/1`.
""".
-spec fallback_rule() -> binary().
fallback_rule() ->
    ~"nplurals=2; plural=n != 1;".

%% =========================
%% Header tokenization
%% =========================

%% Extract the integer following `nplurals=`. Tolerant of surrounding
%% whitespace and the trailing semicolon. Returns
%% `{error, {missing_nplurals, _}}` when the field is not present, and
%% `{error, {nplurals_out_of_range, N}}` when N is outside the sanity
%% range [1, ?NPLURALS_MAX].
-spec extract_nplurals(binary()) ->
    {ok, pos_integer()} | {error, compile_error()}.
extract_nplurals(Header) ->
    case locate_field(Header, ~"nplurals") of
        {ok, Tail} ->
            {Digits, _} = consume_integer(skip_ws(Tail)),
            case byte_size(Digits) of
                0 ->
                    {error, {missing_nplurals, Header}};
                D when D > ?MAX_INT_DIGITS ->
                    %% Finding #8: cap by DIGIT COUNT before
                    %% `binary_to_integer` materialises the bignum, and
                    %% keep the rejected value OUT of the payload (only
                    %% the digit count + cap are reported) to avoid
                    %% memory/log amplification and the system_limit path.
                    {error, {nplurals_too_many_digits, D, ?MAX_INT_DIGITS}};
                _ ->
                    N = binary_to_integer(Digits),
                    case N >= 1 andalso N =< ?NPLURALS_MAX of
                        true -> {ok, N};
                        false -> {error, {nplurals_out_of_range, N}}
                    end
            end;
        not_found ->
            {error, {missing_nplurals, Header}}
    end.

%% Extract the raw expression following `plural=`, stripping the trailing
%% semicolon and surrounding whitespace. The returned binary still needs
%% to be fed to the recursive-descent parser.
-spec extract_plural_expr(binary()) ->
    {ok, binary()} | {error, compile_error()}.
extract_plural_expr(Header) ->
    case locate_field(Header, ~"plural") of
        {ok, Tail} ->
            ExprRaw = take_until_semicolon_or_end(Tail),
            case trim(ExprRaw) of
                <<>> -> {error, {missing_plural_expr, Header}};
                Trimmed -> {ok, Trimmed}
            end;
        not_found ->
            {error, {missing_plural_expr, Header}}
    end.

%% Locate `Field=` (case-sensitive — GNU gettext spec keeps these names
%% lower-case) in Header and return the bytes immediately after the `=`.
locate_field(Header, Field) ->
    %% Walk the header looking for `Field` followed (after optional
    %% whitespace) by `=`. We require either start-of-string or a
    %% delimiter (whitespace or `;`) before `Field` so that we do not
    %% match `nplurals` inside `intplurals` or similar.
    locate_field(Header, Field, 0).

locate_field(Bin, Field, Offset) ->
    case binary:match(Bin, Field, [{scope, {Offset, byte_size(Bin) - Offset}}]) of
        nomatch ->
            not_found;
        {Start, Len} ->
            case is_field_boundary_left(Bin, Start) of
                true ->
                    Tail0 = binary:part(
                        Bin,
                        Start + Len,
                        byte_size(Bin) - (Start + Len)
                    ),
                    case skip_to_equals(Tail0) of
                        {ok, Tail1} -> {ok, Tail1};
                        not_found -> locate_field(Bin, Field, Start + Len)
                    end;
                false ->
                    locate_field(Bin, Field, Start + Len)
            end
    end.

is_field_boundary_left(_Bin, 0) ->
    true;
is_field_boundary_left(Bin, Start) ->
    Prev = binary:at(Bin, Start - 1),
    is_header_delim(Prev).

is_header_delim($\s) -> true;
is_header_delim($\t) -> true;
is_header_delim($\n) -> true;
is_header_delim($\r) -> true;
is_header_delim($;) -> true;
is_header_delim(_) -> false.

skip_to_equals(<<>>) ->
    not_found;
skip_to_equals(<<$=, Rest/binary>>) ->
    {ok, Rest};
skip_to_equals(<<C, Rest/binary>>) when C =:= $\s; C =:= $\t ->
    skip_to_equals(Rest);
skip_to_equals(_) ->
    not_found.

take_until_semicolon_or_end(Bin) ->
    take_until_semicolon_or_end(Bin, 0).

take_until_semicolon_or_end(Bin, N) when N >= byte_size(Bin) ->
    Bin;
take_until_semicolon_or_end(Bin, N) ->
    case binary:at(Bin, N) of
        $; -> binary:part(Bin, 0, N);
        %% header line terminator
        $\n -> binary:part(Bin, 0, N);
        _ -> take_until_semicolon_or_end(Bin, N + 1)
    end.

%% =========================
%% Recursive-descent parser
%% =========================
%%
%% Grammar (precedence low -> high), per GNU manual §"Plural forms":
%%
%%   expr        := ternary
%%   ternary     := lor ('?' expr ':' expr)?      (right-assoc)
%%   lor         := land ('||' land)*             (left-assoc)
%%   land        := equality ('&&' equality)*     (left-assoc)
%%   equality    := relational (('==' | '!=') relational)*
%%   relational  := additive (('<' | '>' | '<=' | '>=') additive)*
%%   additive    := multiplicative (('+' | '-') multiplicative)*
%%   multiplicative := unary (('*' | '/' | '%') unary)*
%%   unary       := '!' unary | primary
%%   primary     := INTEGER | 'n' | '(' expr ')'

-spec parse_expr_bin(binary()) ->
    {ok, ast()} | {error, compile_error()}.
%% Finding #2: reject an over-long expression BEFORE parsing it, so a
%% multi-KB adversarial rule never reaches the recursive descent.
parse_expr_bin(ExprBin) when byte_size(ExprBin) > ?PLURAL_EXPR_MAX_BYTES ->
    {error, {expr_too_long, byte_size(ExprBin), ?PLURAL_EXPR_MAX_BYTES}};
parse_expr_bin(ExprBin) ->
    try
        {Ast, St} = parse_expr(#ps{src = ExprBin}, 0),
        case skip_ws_st(St) of
            #ps{src = <<>>} ->
                {ok, Ast};
            St2 ->
                %% Trailing garbage (e.g. unbalanced `)` or stray token).
                {error, {syntax_error, {trailing_input, St2#ps.src}, St2#ps.pos}}
        end
    catch
        throw:{syntax_error, Reason, Pos} ->
            {error, {syntax_error, Reason, Pos}};
        %% Finding #2: recursion-depth guard tripped — fail closed with a
        %% structured error rather than parsing an unbounded-depth tree.
        throw:{expr_too_deep, Depth, Pos} ->
            {error, {expr_too_deep, Depth, Pos}}
    end.

%% `Depth` is propagated through every recursive-descent clause and
%% checked at each new nesting level (finding #2). It bounds both the
%% parser's stack and — because the AST it builds is no deeper than the
%% recursion — the hot-path `eval_ast/2` walker's stack per lookup.
parse_expr(St, Depth) when Depth > ?PLURAL_EXPR_MAX_DEPTH ->
    throw({expr_too_deep, Depth, St#ps.pos});
parse_expr(St, Depth) ->
    parse_ternary(St, Depth).

parse_ternary(St0, Depth) ->
    {Cond, St1} = parse_lor(St0, Depth),
    St2 = skip_ws_st(St1),
    case peek_byte(St2) of
        {ok, $?} ->
            St3 = advance(St2, 1),
            {Then, St4} = parse_expr(St3, Depth + 1),
            St5 = skip_ws_st(St4),
            case peek_byte(St5) of
                {ok, $:} ->
                    St6 = advance(St5, 1),
                    {Else, St7} = parse_expr(St6, Depth + 1),
                    {{ternary, Cond, Then, Else}, St7};
                _ ->
                    throw({syntax_error, {expected, $:, peek_byte(St5)}, St5#ps.pos})
            end;
        _ ->
            {Cond, St1}
    end.

parse_lor(St0, Depth) ->
    {Left, St1} = parse_land(St0, Depth),
    parse_lor_tail(Left, St1, Depth).

parse_lor_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek2(St1) of
        {ok, $|, $|} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_land(St2, Depth + 1),
            parse_lor_tail({binop, '||', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_land(St0, Depth) ->
    {Left, St1} = parse_equality(St0, Depth),
    parse_land_tail(Left, St1, Depth).

parse_land_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek2(St1) of
        {ok, $&, $&} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_equality(St2, Depth + 1),
            parse_land_tail({binop, '&&', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_equality(St0, Depth) ->
    {Left, St1} = parse_relational(St0, Depth),
    parse_equality_tail(Left, St1, Depth).

parse_equality_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek2(St1) of
        {ok, $=, $=} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_relational(St2, Depth + 1),
            parse_equality_tail({binop, '==', Left, Right}, St3, Depth);
        {ok, $!, $=} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_relational(St2, Depth + 1),
            parse_equality_tail({binop, '!=', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_relational(St0, Depth) ->
    {Left, St1} = parse_additive(St0, Depth),
    parse_relational_tail(Left, St1, Depth).

parse_relational_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek2(St1) of
        {ok, $<, $=} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_additive(St2, Depth + 1),
            parse_relational_tail({binop, '<=', Left, Right}, St3, Depth);
        {ok, $>, $=} ->
            St2 = advance(St1, 2),
            {Right, St3} = parse_additive(St2, Depth + 1),
            parse_relational_tail({binop, '>=', Left, Right}, St3, Depth);
        {ok, $<, _} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_additive(St2, Depth + 1),
            parse_relational_tail({binop, '<', Left, Right}, St3, Depth);
        {ok, $>, _} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_additive(St2, Depth + 1),
            parse_relational_tail({binop, '>', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_additive(St0, Depth) ->
    {Left, St1} = parse_multiplicative(St0, Depth),
    parse_additive_tail(Left, St1, Depth).

parse_additive_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek_byte(St1) of
        {ok, $+} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_multiplicative(St2, Depth + 1),
            parse_additive_tail({binop, '+', Left, Right}, St3, Depth);
        {ok, $-} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_multiplicative(St2, Depth + 1),
            parse_additive_tail({binop, '-', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_multiplicative(St0, Depth) ->
    {Left, St1} = parse_unary(St0, Depth),
    parse_multiplicative_tail(Left, St1, Depth).

parse_multiplicative_tail(Left, St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek_byte(St1) of
        {ok, $*} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_unary(St2, Depth + 1),
            parse_multiplicative_tail({binop, '*', Left, Right}, St3, Depth);
        {ok, $/} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_unary(St2, Depth + 1),
            parse_multiplicative_tail({binop, '/', Left, Right}, St3, Depth);
        {ok, $%} ->
            St2 = advance(St1, 1),
            {Right, St3} = parse_unary(St2, Depth + 1),
            parse_multiplicative_tail({binop, '%', Left, Right}, St3, Depth);
        _ ->
            {Left, St0}
    end.

parse_unary(St0, Depth) when Depth > ?PLURAL_EXPR_MAX_DEPTH ->
    throw({expr_too_deep, Depth, St0#ps.pos});
parse_unary(St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek_byte(St1) of
        {ok, $!} ->
            %% Disambiguate against `!=` (handled in parse_equality).
            case peek2(St1) of
                {ok, $!, $=} ->
                    parse_primary(St1, Depth);
                _ ->
                    St2 = advance(St1, 1),
                    {Inner, St3} = parse_unary(St2, Depth + 1),
                    {{unop, '!', Inner}, St3}
            end;
        _ ->
            parse_primary(St1, Depth)
    end.

parse_primary(St0, Depth) ->
    St1 = skip_ws_st(St0),
    case peek_byte(St1) of
        {ok, $(} ->
            St2 = advance(St1, 1),
            {Inner, St3} = parse_expr(St2, Depth + 1),
            St4 = skip_ws_st(St3),
            case peek_byte(St4) of
                {ok, $)} ->
                    {Inner, advance(St4, 1)};
                _ ->
                    throw({syntax_error, {unclosed_paren, peek_byte(St4)}, St4#ps.pos})
            end;
        {ok, $n} ->
            %% `n` is a single-character identifier; the GNU grammar
            %% does not permit multi-character identifiers in plural
            %% expressions.
            St2 = advance(St1, 1),
            case peek_byte(St2) of
                {ok, C} when ?IS_IDENT(C) ->
                    throw({syntax_error, {unknown_identifier_after_n, C}, St2#ps.pos});
                _ ->
                    {n, St2}
            end;
        {ok, D} when D >= $0, D =< $9 ->
            {Digits, St2} = consume_integer_st(St1),
            {binary_to_integer(Digits), St2};
        {ok, C} ->
            throw({syntax_error, {unexpected_char, C}, St1#ps.pos});
        eof ->
            throw({syntax_error, unexpected_eof, St1#ps.pos})
    end.

%% =========================
%% Parser state helpers
%% =========================

%% Finding #2 (plural-compile-superlinear-unbounded): dispatch on a
%% `byte_size/1` comparison, NOT a full-binary `=:=` match. `skip_ws/1`
%% only ever strips a leading prefix, so when nothing is consumed the
%% result is byte-for-byte equal and `byte_size(Rest) =:= byte_size(Src)`
%% is exact. Matching `case skip_ws(Src) of Src -> ...` instead forced a
%% byte-by-byte comparison of the whole remaining input on every token
%% (the binaries are structurally equal but not identical), which made
%% the parser O(n^2). `byte_size/1` is O(1), collapsing it to O(n).
skip_ws_st(#ps{src = Src, pos = Pos} = St) ->
    Rest = skip_ws(Src),
    case byte_size(Rest) =:= byte_size(Src) of
        true ->
            St;
        false ->
            Consumed = byte_size(Src) - byte_size(Rest),
            St#ps{src = Rest, pos = Pos + Consumed}
    end.

skip_ws(<<C, Rest/binary>>) when
    C =:= $\s;
    C =:= $\t;
    C =:= $\n;
    C =:= $\r
->
    skip_ws(Rest);
skip_ws(Bin) ->
    Bin.

peek_byte(#ps{src = <<>>}) -> eof;
peek_byte(#ps{src = <<B, _/binary>>}) -> {ok, B}.

peek2(#ps{src = <<>>}) -> eof;
peek2(#ps{src = <<_>>}) -> eof;
peek2(#ps{src = <<A, B, _/binary>>}) -> {ok, A, B}.

advance(#ps{src = Src, pos = Pos} = St, N) ->
    St#ps{
        src = binary:part(Src, N, byte_size(Src) - N),
        pos = Pos + N
    }.

consume_integer_st(#ps{src = Src, pos = Pos}) ->
    {Digits, Rest} = consume_integer(Src),
    Len = byte_size(Digits),
    {Digits, #ps{src = Rest, pos = Pos + Len}}.

consume_integer(Bin) -> consume_integer(Bin, 0).

consume_integer(Bin, N) when N >= byte_size(Bin) ->
    {Bin, <<>>};
consume_integer(Bin, N) ->
    case binary:at(Bin, N) of
        D when D >= $0, D =< $9 -> consume_integer(Bin, N + 1);
        _ -> {binary:part(Bin, 0, N), binary:part(Bin, N, byte_size(Bin) - N)}
    end.

trim(Bin) -> trim_trailing(trim_leading(Bin)).

trim_leading(<<C, Rest/binary>>) when
    C =:= $\s;
    C =:= $\t;
    C =:= $\n;
    C =:= $\r
->
    trim_leading(Rest);
trim_leading(Bin) ->
    Bin.

trim_trailing(Bin) ->
    Size = byte_size(Bin),
    trim_trailing(Bin, Size).

trim_trailing(_Bin, 0) ->
    <<>>;
trim_trailing(Bin, N) ->
    case binary:at(Bin, N - 1) of
        C when C =:= $\s; C =:= $\t; C =:= $\n; C =:= $\r ->
            trim_trailing(Bin, N - 1);
        _ ->
            binary:part(Bin, 0, N)
    end.

%% =========================
%% Interpreter (hot path)
%% =========================

%% C-truthy coercion (from legacy gettexter_plural to_boolean/1).
%% `0` is false, any other integer is true. Boolean inputs are passed
%% through to keep short-circuit interop in `eval_ast/2` clean.
-spec to_boolean(integer() | boolean()) -> boolean().
to_boolean(true) -> true;
to_boolean(false) -> false;
to_boolean(0) -> false;
to_boolean(N) when is_integer(N) -> true.

%% Reverse coercion: booleans returned by `&&`/`||`/`!`/comparison ops
%% must materialize as 0 or 1 on the way out (since plural form indices
%% are integers).
-spec to_integer(integer() | boolean()) -> integer().
to_integer(true) -> 1;
to_integer(false) -> 0;
to_integer(N) when is_integer(N) -> N.

-doc """
Hot-path interpreter: walks the `t:ast/0` for a given `N` and returns an
integer (arithmetic result) OR a boolean (comparison/logical result) — the
caller (`evaluate/2`) coerces via `to_integer/1`.

Invariants for the maintainer: `&&`/`||` short-circuit (mirroring C), so
the right branch is only evaluated when needed — this is what keeps
`evaluate/2` total for `n != 0 && 1/n`. Division/modulo by zero goes
through `eval_div/2`/`eval_rem/2` (coerced to 0), never raw `div`/`rem`.
The recursion depth is bounded by the `?PLURAL_EXPR_MAX_DEPTH` enforced in
`compile/1`, so the stack per lookup is bounded by construction. The
sibling `eval_ast_checked/2` has the SAME shape, but reports the anomaly
as data.
""".
%% Walker. Returns either an integer (arithmetic result) or a boolean
%% (comparison / logical result) — the caller coerces as needed.
-spec eval_ast(ast(), integer()) -> integer() | boolean().
eval_ast(N, _N) when is_integer(N) ->
    N;
eval_ast(n, N) ->
    N;
eval_ast({unop, '!', E}, N) ->
    not to_boolean(eval_ast(E, N));
eval_ast({binop, '&&', L, R}, N) ->
    %% Short-circuit per C semantics: if L is false, do not evaluate R.
    case to_boolean(eval_ast(L, N)) of
        false -> false;
        true -> to_boolean(eval_ast(R, N))
    end;
eval_ast({binop, '||', L, R}, N) ->
    case to_boolean(eval_ast(L, N)) of
        true -> true;
        false -> to_boolean(eval_ast(R, N))
    end;
eval_ast({binop, Op, L, R}, N) ->
    LV = to_integer(eval_ast(L, N)),
    RV = to_integer(eval_ast(R, N)),
    apply_binop(Op, LV, RV);
eval_ast({ternary, C, T, E}, N) ->
    case to_boolean(eval_ast(C, N)) of
        true -> eval_ast(T, N);
        false -> eval_ast(E, N)
    end.

-spec apply_binop(op(), integer(), integer()) -> integer() | boolean().
apply_binop('+', L, R) -> L + R;
apply_binop('-', L, R) -> L - R;
apply_binop('*', L, R) -> L * R;
apply_binop('/', L, R) -> eval_div(L, R);
apply_binop('%', L, R) -> eval_rem(L, R);
apply_binop('==', L, R) -> L =:= R;
apply_binop('!=', L, R) -> L =/= R;
apply_binop('<', L, R) -> L < R;
apply_binop('>', L, R) -> L > R;
apply_binop('<=', L, R) -> L =< R;
apply_binop('>=', L, R) -> L >= R.

%% Total division / modulo (finding #1, Layer 1). A zero divisor is C
%% undefined behaviour; rather than let Erlang `div`/`rem` raise
%% `badarith` in the caller process, we pin the result to 0 — the
%% expression result is still clamped into range afterwards. Real `.po`
%% rules never divide by a value that reaches zero (they guard with
%% short-circuit `&&`/`||`), so this only ever fires on malformed input.
-spec eval_div(integer(), integer()) -> integer().
eval_div(_L, 0) -> 0;
eval_div(L, R) -> L div R.

-spec eval_rem(integer(), integer()) -> integer().
eval_rem(_L, 0) -> 0;
eval_rem(L, R) -> L rem R.

%% =========================
%% Checked interpreter (finding #1, Layer 2)
%% =========================
%%
%% Mirror of `eval_ast/2` that surfaces the two unsafe conditions as
%% structured data instead of clamping: division/modulo by zero and
%% (handled by the caller `evaluate_checked/2`) an out-of-range form.
%% Short-circuit semantics for `&&`/`||` are preserved, so a zero
%% divisor guarded behind a false branch is never reported — matching
%% the dynamic evaluator. Total: never raises.

-spec eval_ast_checked(ast(), integer()) ->
    {ok, integer() | boolean()} | {error, plural_eval_error()}.
eval_ast_checked(N, _N) when is_integer(N) ->
    {ok, N};
eval_ast_checked(n, N) ->
    {ok, N};
eval_ast_checked({unop, '!', E}, N) ->
    case eval_ast_checked(E, N) of
        {ok, V} -> {ok, not to_boolean(V)};
        {error, _} = Err -> Err
    end;
eval_ast_checked({binop, '&&', L, R}, N) ->
    case eval_ast_checked(L, N) of
        {error, _} = Err ->
            Err;
        {ok, LV} ->
            case to_boolean(LV) of
                false -> {ok, false};
                true -> to_boolean_checked(eval_ast_checked(R, N))
            end
    end;
eval_ast_checked({binop, '||', L, R}, N) ->
    case eval_ast_checked(L, N) of
        {error, _} = Err ->
            Err;
        {ok, LV} ->
            case to_boolean(LV) of
                true -> {ok, true};
                false -> to_boolean_checked(eval_ast_checked(R, N))
            end
    end;
eval_ast_checked({binop, Op, L, R}, N) ->
    case eval_ast_checked(L, N) of
        {error, _} = ErrL ->
            ErrL;
        {ok, LV0} ->
            case eval_ast_checked(R, N) of
                {error, _} = ErrR ->
                    ErrR;
                {ok, RV0} ->
                    apply_binop_checked(Op, to_integer(LV0), to_integer(RV0))
            end
    end;
eval_ast_checked({ternary, C, T, E}, N) ->
    case eval_ast_checked(C, N) of
        {error, _} = Err ->
            Err;
        {ok, CV} ->
            case to_boolean(CV) of
                true -> eval_ast_checked(T, N);
                false -> eval_ast_checked(E, N)
            end
    end.

%% Coerce the right operand of a short-circuit op to a boolean result,
%% propagating any error from the underlying evaluation.
-spec to_boolean_checked({ok, integer() | boolean()} | {error, plural_eval_error()}) ->
    {ok, boolean()} | {error, plural_eval_error()}.
to_boolean_checked({error, _} = Err) -> Err;
to_boolean_checked({ok, V}) -> {ok, to_boolean(V)}.

-spec apply_binop_checked(op(), integer(), integer()) ->
    {ok, integer() | boolean()} | {error, plural_eval_error()}.
apply_binop_checked('/', _L, 0) -> {error, {division_by_zero, '/'}};
apply_binop_checked('%', _L, 0) -> {error, {division_by_zero, '%'}};
apply_binop_checked(Op, L, R) -> {ok, apply_binop(Op, L, R)}.

%% =========================
%% Static safety validation (finding #1, Layer 3)
%% =========================
%%
%% Reject — at compile/load time — rules that are STATICALLY guaranteed
%% to fault for every N, so the poisoned catalog is refused by
%% `ensure_loaded` instead of loading as `{ok, _}` and crashing each
%% later lookup. We only reject what is *provably* faulty regardless of
%% input; conditions that fault only for a specific N (e.g. `n / (n-5)`)
%% are left to the dynamic clamp in Layer 1.
%%
%% Two static faults are detected:
%%   * a `/` or `%` whose divisor is a constant subexpression (no `n`)
%%     evaluating to 0 — fails for all N;
%%   * a fully-constant rule (no `n` anywhere) whose value lands outside
%%     [0, NPlurals) — selects a non-existent form for all N.

-doc """
Static safety barrier of `compile/1` (Layer 3, finding #1): rejects at
LOAD only what is provably faulty for EVERY N, so the poisoned catalog is
refused instead of loading as `{ok, _}` and crashing every lookup.

Detects two defects: (1) a `/` or `%` whose divisor is constant (no `n`)
and evaluates to 0 — via `static_div_zero/1`; (2) a FULLY constant rule
(no `n` anywhere) whose value falls outside `[0, NPlurals)` — probed with
`N=0` by `static_form_in_range/2`. Faults that depend on a specific N
(e.g. `n/(n-5)`) are NOT rejected here — they are left to the dynamic
clamp of `evaluate/2` (Layer 1). Total: never raises; the error becomes
the payload of `{unsafe_plural_rule, _}` in `compile/1`.
""".
-spec validate_safe(ast(), pos_integer()) -> ok | {error, plural_eval_error()}.
validate_safe(Ast, NPlurals) ->
    case static_div_zero(Ast) of
        {error, _} = Err ->
            Err;
        ok ->
            case is_constant(Ast) of
                false ->
                    %% Depends on N — Layer 1 covers any dynamic fault.
                    ok;
                true ->
                    %% N is irrelevant for a constant rule; 0 is a fine
                    %% probe. A constant out-of-range result is a static
                    %% fault. We evaluate the AST directly (not via
                    %% evaluate_checked/2) to avoid materialising a partial
                    %% plural_compiled() map just for the probe.
                    static_form_in_range(Ast, NPlurals)
            end
    end.

%% Probe a constant rule's form index and reject it if it lands outside
%% [0, NPlurals). `static_div_zero/1` has already cleared the AST of
%% division-by-zero, so the only error reachable here is an out-of-range
%% form; the {error, _} arm just propagates any future eval anomaly.
-spec static_form_in_range(ast(), pos_integer()) -> ok | {error, plural_eval_error()}.
static_form_in_range(Ast, NPlurals) ->
    %% `Ast` is constant and already cleared of static div-by-zero upstream, so
    %% `eval_ast_checked/2` returns `{ok, _}` here; an `{error, _}` would be a
    %% contract violation and crashes explicitly (`case_clause`).
    case eval_ast_checked(Ast, 0) of
        {ok, Value} ->
            Form = to_integer(Value),
            case Form >= 0 andalso Form < NPlurals of
                true -> ok;
                false -> {error, {form_out_of_range, Form, NPlurals}}
            end
    end.

%% Walk the AST for a `/` or `%` whose divisor is statically zero. The
%% divisor counts as statically zero only when it is constant (contains
%% no `n`) and evaluates to 0 — a divisor that depends on N is not a
%% static fault.
-spec static_div_zero(ast()) -> ok | {error, plural_eval_error()}.
static_div_zero(N) when is_integer(N) ->
    ok;
static_div_zero(n) ->
    ok;
static_div_zero({unop, '!', E}) ->
    static_div_zero(E);
static_div_zero({binop, Op, L, R}) when Op =:= '/'; Op =:= '%' ->
    case static_div_zero(L) of
        {error, _} = Err ->
            Err;
        ok ->
            case static_div_zero(R) of
                {error, _} = Err -> Err;
                ok -> check_static_divisor(Op, R)
            end
    end;
static_div_zero({binop, _Op, L, R}) ->
    case static_div_zero(L) of
        {error, _} = Err -> Err;
        ok -> static_div_zero(R)
    end;
static_div_zero({ternary, C, T, E}) ->
    case static_div_zero(C) of
        {error, _} = Err ->
            Err;
        ok ->
            case static_div_zero(T) of
                {error, _} = Err -> Err;
                ok -> static_div_zero(E)
            end
    end.

-spec check_static_divisor('/' | '%', ast()) -> ok | {error, plural_eval_error()}.
check_static_divisor(Op, Divisor) ->
    case is_constant(Divisor) of
        false ->
            ok;
        true ->
            %% A nested static div-by-zero inside the divisor is already
            %% reported by the recursive walk, so `eval_ast_checked/2` returns
            %% `{ok, _}` here (an `{error, _}` would crash explicitly).
            case eval_ast_checked(Divisor, 0) of
                {ok, V} ->
                    case to_integer(V) of
                        0 -> {error, {division_by_zero, Op}};
                        _ -> ok
                    end
            end
    end.

%% True when the AST contains no reference to the variable `n`, so its
%% value is independent of the lookup count.
-spec is_constant(ast()) -> boolean().
is_constant(N) when is_integer(N) -> true;
is_constant(n) -> false;
is_constant({unop, '!', E}) -> is_constant(E);
is_constant({binop, _Op, L, R}) -> is_constant(L) andalso is_constant(R);
is_constant({ternary, C, T, E}) -> is_constant(C) andalso is_constant(T) andalso is_constant(E).

%% =========================
%% AST node-count cap (finding #9, plural-bignum-cpu-dos-evaluate-hotpath)
%% =========================
%%
%% Reject — at compile/load time — an AST whose node count exceeds
%% `?AST_MAX_NODES`. The byte and depth caps from finding #2 do not bound
%% the node count, so a wide flat operator chain (`n*n*...*n`, ~2000 nodes
%% inside the 2048-byte cap, at a single recursion level) would otherwise
%% be installed and walked by `evaluate/2` — growing an `n^k` bignum — on
%% every ngettext lookup. Capping the node count keeps the installed AST
%% small so `evaluate/2`'s cost is bounded by construction (largest
%% intermediate bignum is `n^?AST_MAX_NODES`). Runs once on the load path,
%% never on the hot path.

%% Short-circuiting node-count guard: stops descending as soon as the
%% budget is blown, so its cost is O(min(nodes, ?AST_MAX_NODES)) — never
%% proportional to a pathologically large AST.
-spec check_ast_complexity(ast()) ->
    ok | {error, {expr_too_complex, pos_integer(), pos_integer()}}.
check_ast_complexity(Ast) ->
    case count_nodes_bounded(Ast, 0) of
        {ok, _Total} ->
            ok;
        over_limit ->
            %% Only the (rare, off-hot-path) error branch pays for the
            %% exact total — used purely for diagnostics.
            {error, {expr_too_complex, ast_node_count(Ast), ?AST_MAX_NODES}}
    end.

%% Budgeted counter. Returns `{ok, Total}` if the whole AST fits within
%% `?AST_MAX_NODES`, otherwise `over_limit` at the first node that blows
%% the budget.
-spec count_nodes_bounded(ast(), non_neg_integer()) ->
    {ok, non_neg_integer()} | over_limit.
count_nodes_bounded(_Ast, Acc) when Acc > ?AST_MAX_NODES ->
    over_limit;
count_nodes_bounded(N, Acc) when is_integer(N) ->
    {ok, Acc + 1};
count_nodes_bounded(n, Acc) ->
    {ok, Acc + 1};
count_nodes_bounded({unop, '!', E}, Acc) ->
    count_nodes_bounded(E, Acc + 1);
count_nodes_bounded({binop, _Op, L, R}, Acc) ->
    case count_nodes_bounded(L, Acc + 1) of
        over_limit -> over_limit;
        {ok, Acc1} -> count_nodes_bounded(R, Acc1)
    end;
count_nodes_bounded({ternary, C, T, E}, Acc) ->
    case count_nodes_bounded(C, Acc + 1) of
        over_limit ->
            over_limit;
        {ok, Acc1} ->
            case count_nodes_bounded(T, Acc1) of
                over_limit -> over_limit;
                {ok, Acc2} -> count_nodes_bounded(E, Acc2)
            end
    end.

%% Exact, total node count — used only on the error branch for diagnostic
%% reporting (`{expr_too_complex, Nodes, Max}`).
-spec ast_node_count(ast()) -> pos_integer().
ast_node_count(N) when is_integer(N) ->
    1;
ast_node_count(n) ->
    1;
ast_node_count({unop, '!', E}) ->
    1 + ast_node_count(E);
ast_node_count({binop, _Op, L, R}) ->
    1 + ast_node_count(L) + ast_node_count(R);
ast_node_count({ternary, C, T, E}) ->
    1 + ast_node_count(C) + ast_node_count(T) + ast_node_count(E).

%% =========================
%% CLDR canonical rules
%% =========================
%%
%% Hard-coded subset of the CLDR `plurals.json` data
%% (cldr-json/cldr-core/supplemental/plurals.json in
%% https://github.com/unicode-org/cldr-json,
%% retrieved 2026-05 — see also https://cldr.unicode.org/index/cldr-spec/plural-rules
%% for the rule language). Each row is `{Locale, NPlurals, ExprBin}`
%% where `ExprBin` is the C-style plural expression that, when paired
%% with `nplurals=NPlurals`, produces the canonical CLDR rule.
%%
%% Region-tagged locales (e.g. `pt_BR`) are included where CLDR
%% diverges from the base language (e.g. `pt` is European Portuguese
%% with `n != n` (sic — n!=0 && n!=1; this codifies the simple
%% historical `n > 1`), while `pt_BR` matches the legacy `n > 1`).
%% Locales not listed fall back to the base language tag via
%% `cldr_rule/1`.
%%
%% Strategy (Option A — hard-coded):
%%   Pros: zero deps, byte-equal control over what ships, easy to audit.
%%   Cons: requires manual sync on each CLDR release.
%%
%% Generating the table from upstream CLDR JSON (Option C) was considered
%% and not adopted: the inline literal keeps this module's data surface
%% small, dependency-free, and reviewable.

-doc """
Embedded CLDR table — the only TRUSTED data source (a static literal) of
this module.

The rows between the `BEGIN/END GENERATED CLDR TABLE` markers are
GENERATED by `bin/gen-plural-table.escript` from the committed seed
`apps/erli18n/priv/gettext/plural_forms.eterm`. On a CLDR release sync,
edit the seed and re-run the generator (`escript
bin/gen-plural-table.escript`) instead of editing the rows by hand.

Each row is `{Locale, NPlurals, ExprBin}`, where `ExprBin` paired with
`nplurals=NPlurals` reproduces that locale's CLDR canonical rule. Region
rows only exist where they diverge from the base language (e.g. `pt_PT` =
`n != 1` against the base `pt` = `n > 1`); the rest falls back to the base
via `cldr_rule/1`.

For the maintainer: each expression MUST be valid for `compile/1`. The
generator validates this when the compiled beam is reachable, and
`build_cldr_compiled_table/0` calls `compile/1` on each row when building the
cache and accepts only `{ok, _}`. A row that fails to compile is a defect in
this trusted static literal and crashes the build loudly with `case_clause`,
rather than silently degrading to "no CLDR entry"; review edits carefully.
""".
cldr_data() ->
    %% BEGIN GENERATED CLDR TABLE
    %% Generated by bin/gen-plural-table.escript from
    %% apps/erli18n/priv/gettext/plural_forms.eterm. Do not edit by hand;
    %% edit the seed table and re-run the generator.
    [
        {<<"ar">>, 6, <<
            "n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 : n%100>=11 ? 4 : 5"
        >>},
        {<<"bg">>, 2, <<"n != 1">>},
        {<<"cs">>, 3, <<"(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2">>},
        {<<"da">>, 2, <<"n != 1">>},
        {<<"de">>, 2, <<"n != 1">>},
        {<<"de_AT">>, 2, <<"n != 1">>},
        {<<"de_CH">>, 2, <<"n != 1">>},
        {<<"el">>, 2, <<"n != 1">>},
        {<<"en">>, 2, <<"n != 1">>},
        {<<"en_GB">>, 2, <<"n != 1">>},
        {<<"en_US">>, 2, <<"n != 1">>},
        {<<"es">>, 2, <<"n != 1">>},
        {<<"es_ES">>, 2, <<"n != 1">>},
        {<<"es_MX">>, 2, <<"n != 1">>},
        {<<"et">>, 2, <<"n != 1">>},
        {<<"fa">>, 2, <<"n != 1">>},
        {<<"fi">>, 2, <<"n != 1">>},
        {<<"fr">>, 2, <<"n > 1">>},
        {<<"fr_CA">>, 2, <<"n > 1">>},
        {<<"fr_FR">>, 2, <<"n > 1">>},
        {<<"he">>, 2, <<"n != 1">>},
        {<<"hi">>, 2, <<"n != 1">>},
        {<<"hr">>, 3, <<
            "n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
        >>},
        {<<"hu">>, 2, <<"n != 1">>},
        {<<"it">>, 2, <<"n != 1">>},
        {<<"ja">>, 1, <<"0">>},
        {<<"ko">>, 1, <<"0">>},
        {<<"nb">>, 2, <<"n != 1">>},
        {<<"nl">>, 2, <<"n != 1">>},
        {<<"nn">>, 2, <<"n != 1">>},
        {<<"no">>, 2, <<"n != 1">>},
        {<<"pl">>, 3, <<"n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2">>},
        {<<"pt">>, 2, <<"n > 1">>},
        {<<"pt_BR">>, 2, <<"n > 1">>},
        {<<"pt_PT">>, 2, <<"n != 1">>},
        {<<"ro">>, 3, <<"n==1 ? 0 : (n==0 || (n%100>0 && n%100<20)) ? 1 : 2">>},
        {<<"ru">>, 3, <<
            "n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
        >>},
        {<<"sk">>, 3, <<"(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2">>},
        {<<"sl">>, 4, <<"n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3">>},
        {<<"sr">>, 3, <<
            "n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
        >>},
        {<<"sv">>, 2, <<"n != 1">>},
        {<<"th">>, 1, <<"0">>},
        {<<"tr">>, 2, <<"n != 1">>},
        {<<"uk">>, 3, <<
            "n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2"
        >>},
        {<<"vi">>, 1, <<"0">>},
        {<<"zh">>, 1, <<"0">>},
        {<<"zh_CN">>, 1, <<"0">>},
        {<<"zh_HK">>, 1, <<"0">>},
        {<<"zh_TW">>, 1, <<"0">>}
    ].
%% END GENERATED CLDR TABLE

lookup_locale(Locale) ->
    lookup_locale(Locale, cldr_data()).

lookup_locale(_Locale, []) ->
    undefined;
lookup_locale(Locale, [{Locale, N, Expr} | _]) ->
    {ok, N, Expr};
lookup_locale(Locale, [_ | Rest]) ->
    lookup_locale(Locale, Rest).

%% Strip the region tag from a locale (`pt_BR` -> `pt`, `zh-Hant` ->
%% `zh`). Accepts both `_` and `-` as separators per BCP47 leniency.
base_locale(Locale) ->
    case binary:match(Locale, [~"_", ~"-"]) of
        nomatch -> Locale;
        {Pos, _Len} -> binary:part(Locale, 0, Pos)
    end.

%% =========================
%% CLDR equivalence check (pre-compiled)
%% =========================
%%
%% Divergence is checked structurally on the parsed ASTs (nplurals + expr
%% AST), so it is whitespace and paren-noise insensitive — `(n != 1)`
%% matches `n != 1`. Finding #17: the loader already compiled the header
%% AST, so `validate_against_cldr_ast/2` reuses it; the CLDR side is taken
%% from a one-time MEMOISED table of compiled bundles (`cldr_compiled/1`)
%% instead of re-parsing the canonical rule on every load. This removes
%% the second header compile (and the per-load CLDR synthesise+compile +
%% linear scans) that the old `ast_equivalent/split_rule` path incurred,
%% and decouples divergence from the plural-compile O(n^2) bug (a
%% pathological header is compiled once, not twice).

%% persistent_term key for the memoised CLDR AST table. The table is a
%% constant (a fixed set of static-literal rows) so a single global cache is
%% sound: it is content-addressed by the module and never invalidated.
-define(CLDR_COMPILED_KEY, {?MODULE, cldr_compiled_table}).

%% Compiled CLDR bundle for a locale, with region fallback identical to
%% `cldr_rule/1` (`fr_BE` -> `fr`). Returns the same `plural_compiled()`
%% shape as `compile/1` so the AST can be compared directly; `raw` carries
%% the CLDR canonical EXPRESSION binary (matching the old
%% `validate_against_cldr/2` warning payload, which used `cldr_rule/1`'s
%% expr). `undefined` when neither the locale nor its base is in the table.
-spec cldr_compiled(binary()) -> plural_compiled() | undefined.
cldr_compiled(Locale) when is_binary(Locale) ->
    Table = cldr_compiled_table(),
    case Table of
        #{Locale := Bundle} ->
            Bundle;
        #{} ->
            case base_locale(Locale) of
                Locale ->
                    undefined;
                Base ->
                    maps:get(Base, Table, undefined)
            end
    end.

-doc """
Memoised `locale => compiled bundle` map — the ONLY side effect of the
module, built exactly once per node and cached in `persistent_term`.

For the maintainer: the cache is a module-scoped singleton, written under
the fixed key `?CLDR_COMPILED_KEY` (the tuple
`{?MODULE, cldr_compiled_table}`, NOT a content hash) and NEVER
invalidated — `cldr_data/0` is a constant literal, so a global cache is
safe. The first call builds it via `build_cldr_compiled_table/0` and
writes it; the following ones hit the cache (the cast on the hit branch only
re-announces the `term()` from `persistent_term:get/2`, since the only writer
is the clause above). It is
only for the cold divergence path (`cldr_compiled/1`); it is never touched
by `evaluate/2`.
""".
%% Return the memoised locale -> compiled-bundle map, building it exactly
%% once per node and caching it in `persistent_term`. `cldr_data/0` is a
%% static, trusted constant whose every expression is a canonical CLDR
%% rule, so each `compile/1` here is guaranteed to succeed; a malformed
%% row would be a build-time defect in this module and is surfaced
%% immediately (the bad row is simply dropped from the table, so it falls
%% back to "no CLDR entry" rather than crashing the loader).
-spec cldr_compiled_table() -> #{binary() => plural_compiled()}.
cldr_compiled_table() ->
    case persistent_term:get(?CLDR_COMPILED_KEY, undefined) of
        undefined ->
            Table = build_cldr_compiled_table(),
            persistent_term:put(?CLDR_COMPILED_KEY, Table),
            Table;
        Table when is_map(Table) ->
            %% `persistent_term:get/2` is typed `term()`; the `is_map/1` guard
            %% narrows it to a map (the only writer is the clause above, so this
            %% is the cache hit), which eqwalizer accepts here — no cast needed.
            Table
    end.

-spec build_cldr_compiled_table() -> #{binary() => plural_compiled()}.
build_cldr_compiled_table() ->
    lists:foldl(
        fun({Locale, N, Expr}, Acc) ->
            Header = <<
                "nplurals=",
                (integer_to_binary(N))/binary,
                "; plural=",
                Expr/binary,
                ";"
            >>,
            %% `cldr_data/0` is a trusted static literal whose every row
            %% compiles; a row that failed would be a defect and crashes
            %% explicitly at table build (`case_clause`) rather than silently
            %% degrading to "no CLDR entry".
            case compile(Header) of
                {ok, #{nplurals := NC, expr := Ast}} ->
                    %% Store the raw CLDR EXPRESSION (not the synthesised
                    %% header) as `raw`, to match the legacy warning
                    %% payload that surfaced `cldr_rule/1`'s expr.
                    Acc#{Locale => #{nplurals => NC, expr => Ast, raw => Expr}}
            end
        end,
        #{},
        cldr_data()
    ).