# localize_mf2_treesitter
Elixir bindings to the [MF2 tree-sitter grammar](https://github.com/elixir-localize/mf2_treesitter). Server-side MF2 parsing in the BEAM via a C NIF — incremental, error-recovering, position-aware.
## Complements the Localize.Message.Parser parser
The `localize` hex package provides a NimbleParsec parser for MF2. That parser is strict, fast, and fails on the first error — correct for runtime formatting via `Localize.Message.format/3`. This package is resilient and position-aware — correct for:
* **LSP servers** (hover, goto-definition, diagnostics, semantic tokens).
* **Build-time validation** of MF2 messages in `~M` sigils or stored translations.
* **Server-side rendering** fallback for a client-side MF2 editor — render a highlight pass at mount time so the first paint isn't a flash of unstyled text while `web-tree-sitter` downloads in the browser.
* **Static analysis** (linters, translation audits, migration tools).
The two coexist in the same app with no conflict. Use whichever fits the job.
## If you're after MF2 parsing in a broswer
If you need to edit MF2 messages in a browser, see the [tree-sitter-mf2](https://www.npmjs.com/package/tree-sitter-mf2) npm packages which works from the same MF2 tree-sitter grammar and build by the same author.
## Installation
```elixir
def deps do
[
{:localize_mf2_treesitter, "~> 0.1"}
]
end
```
No runtime dependencies beyond the ERTS NIF interface. A C11-capable compiler is required at build time.
## Usage
```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> root = Localize.Mf2.TreeSitter.root(tree)
iex> Localize.Mf2.TreeSitter.Node.type(root)
"source_file"
iex> Localize.Mf2.TreeSitter.Node.has_error?(root)
false
iex> Localize.Mf2.TreeSitter.Node.text(root)
"hello {$name}"
```
Walk the tree with the accessors in `Localize.Mf2.TreeSitter.Node` — see the moduledoc for the full list. Byte ranges are in bytes; point coordinates are `{row, column}` with row zero-indexed.
### Error recovery
Invalid MF2 input still produces a tree. Tree-sitter's GLR engine places `ERROR` or `MISSING` nodes at the failure points rather than aborting.
```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$")
iex> Localize.Mf2.TreeSitter.Node.has_error?(Localize.Mf2.TreeSitter.root(tree))
true
iex> [{:error, _} | _] = Localize.Mf2.TreeSitter.diagnostics(tree)
```
This is the property that makes the grammar LSP-friendly — an editor never sees a dead parse.
### Queries
The highlight query shipped in `mf2_treesitter` is available via `Query.load/1`:
```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> {:ok, query} = Localize.Mf2.TreeSitter.Query.load(:highlights)
iex> [{"variable", node} | _] =
...> Localize.Mf2.TreeSitter.Query.captures(query, Localize.Mf2.TreeSitter.root(tree))
...> |> Enum.filter(fn {name, _} -> name == "variable" end)
iex> Localize.Mf2.TreeSitter.Node.text(node)
"name"
```
Use `Query.matches/2` when pattern provenance matters (e.g. distinguishing which rule matched), and `Query.captures/2` for the flat highlight-pass case.
### Incremental reparse
After a text edit, produce a new tree without redoing the full parse:
```elixir
iex> old_src = "hello {$name}"
iex> new_src = "hello {$name}!"
iex> {:ok, old} = Localize.Mf2.TreeSitter.parse(old_src)
iex> edit = %Localize.Mf2.TreeSitter.Edit{
...> start_byte: byte_size(old_src),
...> old_end_byte: byte_size(old_src),
...> new_end_byte: byte_size(new_src),
...> start_point: {0, byte_size(old_src)},
...> old_end_point: {0, byte_size(old_src)},
...> new_end_point: {0, byte_size(new_src)}
...> }
iex> {:ok, new} = Localize.Mf2.TreeSitter.parse_incremental(old, [edit], new_src)
iex> Localize.Mf2.TreeSitter.changed_ranges(old, new)
[{13, 14, {0, 13}, {0, 14}}]
```
The old tree is **not** mutated; the NIF clones it, applies the edits to the clone, and feeds the clone to the parser. `changed_ranges/2` returns the byte/point ranges that differ.
## Keeping the grammar current
The grammar files under `c_src/grammar/` and `priv/queries/` are vendored from the [`tree-sitter-mf2`](https://www.npmjs.com/package/tree-sitter-mf2) npm package (published from [`mf2_treesitter`](https://github.com/elixir-localize/mf2_treesitter)). A mix task pins a specific version and fetches files from the published tarball via the unpkg CDN — no sibling repo checkout required, fully reproducible.
```bash
# Fetch from npm at the pinned version and update vendored files.
mix localize_mf2_treesitter.sync
# CI check — exit non-zero if vendored files drift from the pinned
# version. Does not modify files.
mix localize_mf2_treesitter.sync --check
```
The pinned version lives at the top of the task module as `@tree_sitter_mf2_version`. To move to a newer grammar release, bump that string and re-run the task. **Keep the pin in step with `mf2_wasm_editor`'s own sync task** — grammar tree shape is the API boundary between this NIF (server-side parse) and the WASM editor (browser-side parse); a version skew can produce different trees for the same input, breaking the canonicalisation round-trip.
### Offline / local-iteration override
If you're iterating on the grammar locally and want the sync to read from a sibling checkout rather than hit the network, set `MF2_TREESITTER_DIR`:
```bash
MF2_TREESITTER_DIR=/path/to/mf2_treesitter mix localize_mf2_treesitter.sync
```
### No `--build-wasm` flag
This package doesn't produce a WASM bundle. The WASM consumer is [`mf2_wasm_editor`](https://github.com/elixir-localize/mf2_wasm_editor), which has its own sync task. The grammar package (`tree-sitter-mf2`) is the common upstream both packages sync from.
## Keeping the tree-sitter runtime current
Separate from the grammar, the package also embeds the **tree-sitter C runtime** under `c_src/runtime/`. It compiles alongside `parser.c` into the NIF `.so`. The runtime's supported ABI version must be ≥ the version `parser.c` was generated against — otherwise `ts_parser_set_language()` refuses the language at load time and every parse returns `{:error, :parse_failed}`.
A dedicated mix task refreshes the runtime from upstream [`tree-sitter/tree-sitter`](https://github.com/tree-sitter/tree-sitter):
```bash
# Fetch and overlay the pinned runtime version.
mix localize_mf2_treesitter.update_runtime
# CI check — exits non-zero if any vendored runtime file drifts
# from the pinned version. Doesn't modify files.
mix localize_mf2_treesitter.update_runtime --check
```
The pinned version lives at the top of the task module as `@runtime_version`. **Bump it whenever you bump the grammar pin** in `localize_mf2_treesitter.sync`, so the runtime's ABI support stays ahead of (or equal to) whatever `parser.c` needs. As a rule of thumb, match the `@runtime_version` here to the tree-sitter CLI version that generated `c_src/grammar/parser.c` (check its first-line comment).
The task preserves `c_src/runtime/src/lib.c` (our hand-written amalgamation wrapper — it adds `#define _POSIX_C_SOURCE 200112L` and `#include`s every runtime `.c` file). If upstream adds a new `.c` file under `lib/src/`, the task warns that `lib.c` needs an extra `#include` line.
## Roadmap
* Precompiled NIF artefacts via `:elixir_make_precompiler`, so downstream consumers don't need a C toolchain.
## Licence
Apache-2.0 for this package. The vendored tree-sitter runtime under `c_src/runtime/` is MIT — see `c_src/runtime/LICENSE`.