README.md

# localize_mf2_treesitter

Elixir bindings to the [MF2 tree-sitter grammar](https://github.com/elixir-localize/mf2_treesitter). Server-side MF2 parsing in the BEAM via a C NIF — incremental, error-recovering, position-aware.

## Complements the Localize.Message.Parser parser

The `localize` hex package provides a NimbleParsec parser for MF2. That parser is strict, fast, and fails on the first error — correct for runtime formatting via `Localize.Message.format/3`. This package is resilient and position-aware — correct for:

* **LSP servers** (hover, goto-definition, diagnostics, semantic tokens).
* **Build-time validation** of MF2 messages in `~M` sigils or stored translations.
* **Server-side rendering** fallback for a client-side MF2 editor — render a highlight pass at mount time so the first paint isn't a flash of unstyled text while `web-tree-sitter` downloads in the browser.
* **Static analysis** (linters, translation audits, migration tools).

The two coexist in the same app with no conflict. Use whichever fits the job.

## If you're after MF2 parsing in a broswer

If you need to edit MF2 messages in a browser, see the [tree-sitter-mf2](https://www.npmjs.com/package/tree-sitter-mf2) npm packages which works from the same MF2 tree-sitter grammar and build by the same author.

## Installation

```elixir
def deps do
  [
    {:localize_mf2_treesitter, "~> 0.1"}
  ]
end
```

No runtime dependencies beyond the ERTS NIF interface. A C11-capable compiler is required at build time.

## Usage

```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> root = Localize.Mf2.TreeSitter.root(tree)
iex> Localize.Mf2.TreeSitter.Node.type(root)
"source_file"
iex> Localize.Mf2.TreeSitter.Node.has_error?(root)
false
iex> Localize.Mf2.TreeSitter.Node.text(root)
"hello {$name}"
```

Walk the tree with the accessors in `Localize.Mf2.TreeSitter.Node` — see the moduledoc for the full list. Byte ranges are in bytes; point coordinates are `{row, column}` with row zero-indexed.

### Error recovery

Invalid MF2 input still produces a tree. Tree-sitter's GLR engine places `ERROR` or `MISSING` nodes at the failure points rather than aborting.

```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$")
iex> Localize.Mf2.TreeSitter.Node.has_error?(Localize.Mf2.TreeSitter.root(tree))
true
iex> [{:error, _} | _] = Localize.Mf2.TreeSitter.diagnostics(tree)
```

This is the property that makes the grammar LSP-friendly — an editor never sees a dead parse.

### Queries

The highlight query shipped in `mf2_treesitter` is available via `Query.load/1`:

```elixir
iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> {:ok, query} = Localize.Mf2.TreeSitter.Query.load(:highlights)
iex> [{"variable", node} | _] =
...>   Localize.Mf2.TreeSitter.Query.captures(query, Localize.Mf2.TreeSitter.root(tree))
...>   |> Enum.filter(fn {name, _} -> name == "variable" end)
iex> Localize.Mf2.TreeSitter.Node.text(node)
"name"
```

Use `Query.matches/2` when pattern provenance matters (e.g. distinguishing which rule matched), and `Query.captures/2` for the flat highlight-pass case.

### Incremental reparse

After a text edit, produce a new tree without redoing the full parse:

```elixir
iex> old_src = "hello {$name}"
iex> new_src = "hello {$name}!"
iex> {:ok, old} = Localize.Mf2.TreeSitter.parse(old_src)
iex> edit = %Localize.Mf2.TreeSitter.Edit{
...>   start_byte: byte_size(old_src),
...>   old_end_byte: byte_size(old_src),
...>   new_end_byte: byte_size(new_src),
...>   start_point: {0, byte_size(old_src)},
...>   old_end_point: {0, byte_size(old_src)},
...>   new_end_point: {0, byte_size(new_src)}
...> }
iex> {:ok, new} = Localize.Mf2.TreeSitter.parse_incremental(old, [edit], new_src)
iex> Localize.Mf2.TreeSitter.changed_ranges(old, new)
[{13, 14, {0, 13}, {0, 14}}]
```

The old tree is **not** mutated; the NIF clones it, applies the edits to the clone, and feeds the clone to the parser. `changed_ranges/2` returns the byte/point ranges that differ.

## Keeping the grammar current

The grammar files under `c_src/grammar/` and `priv/queries/` are vendored from the [`tree-sitter-mf2`](https://www.npmjs.com/package/tree-sitter-mf2) npm package (published from [`mf2_treesitter`](https://github.com/elixir-localize/mf2_treesitter)). A mix task pins a specific version and fetches files from the published tarball via the unpkg CDN — no sibling repo checkout required, fully reproducible.

```bash
# Fetch from npm at the pinned version and update vendored files.
mix localize_mf2_treesitter.sync

# CI check — exit non-zero if vendored files drift from the pinned
# version. Does not modify files.
mix localize_mf2_treesitter.sync --check
```

The pinned version lives at the top of the task module as `@tree_sitter_mf2_version`. To move to a newer grammar release, bump that string and re-run the task. **Keep the pin in step with `mf2_wasm_editor`'s own sync task** — grammar tree shape is the API boundary between this NIF (server-side parse) and the WASM editor (browser-side parse); a version skew can produce different trees for the same input, breaking the canonicalisation round-trip.

### Offline / local-iteration override

If you're iterating on the grammar locally and want the sync to read from a sibling checkout rather than hit the network, set `MF2_TREESITTER_DIR`:

```bash
MF2_TREESITTER_DIR=/path/to/mf2_treesitter mix localize_mf2_treesitter.sync
```

### No `--build-wasm` flag

This package doesn't produce a WASM bundle. The WASM consumer is [`mf2_wasm_editor`](https://github.com/elixir-localize/mf2_wasm_editor), which has its own sync task. The grammar package (`tree-sitter-mf2`) is the common upstream both packages sync from.

## Keeping the tree-sitter runtime current

Separate from the grammar, the package also embeds the **tree-sitter C runtime** under `c_src/runtime/`. It compiles alongside `parser.c` into the NIF `.so`. The runtime's supported ABI version must be ≥ the version `parser.c` was generated against — otherwise `ts_parser_set_language()` refuses the language at load time and every parse returns `{:error, :parse_failed}`.

A dedicated mix task refreshes the runtime from upstream [`tree-sitter/tree-sitter`](https://github.com/tree-sitter/tree-sitter):

```bash
# Fetch and overlay the pinned runtime version.
mix localize_mf2_treesitter.update_runtime

# CI check — exits non-zero if any vendored runtime file drifts
# from the pinned version. Doesn't modify files.
mix localize_mf2_treesitter.update_runtime --check
```

The pinned version lives at the top of the task module as `@runtime_version`. **Bump it whenever you bump the grammar pin** in `localize_mf2_treesitter.sync`, so the runtime's ABI support stays ahead of (or equal to) whatever `parser.c` needs. As a rule of thumb, match the `@runtime_version` here to the tree-sitter CLI version that generated `c_src/grammar/parser.c` (check its first-line comment).

The task preserves `c_src/runtime/src/lib.c` (our hand-written amalgamation wrapper — it adds `#define _POSIX_C_SOURCE 200112L` and `#include`s every runtime `.c` file). If upstream adds a new `.c` file under `lib/src/`, the task warns that `lib.c` needs an extra `#include` line.

## Roadmap

* Precompiled NIF artefacts via `:elixir_make_precompiler`, so downstream consumers don't need a C toolchain.

## Licence

Apache-2.0 for this package. The vendored tree-sitter runtime under `c_src/runtime/` is MIT — see `c_src/runtime/LICENSE`.