Skip to main content

notes/model-ir-contract.md

# The Firebreak model IR — a contract for backends

`mix firebreak --format model` emits a **supervision-model intermediate
representation**: per supervisor, exactly the facts a *lifecycle* model needs and
nothing it doesn't. It's a stable, versioned, checkable interface that other tools
build on. `Firebreak.Tla` (TLA+ spec generation) and `Firebreak.Conformance`
(topology-drift detection) are simply the first two consumers — both are *pure
functions of this IR*, and so is anything you write against it.

This document is the contract: the envelope, the schema, the serialization rules,
the design law that decides what's in and out, and the versioning promise. If you
produce or consume the IR, this is the spec.

```sh
mix firebreak --format model            # emit the IR as JSON
```

In Elixir you can also call `Firebreak.Model.build/1` for the in-memory term form
(atom keys) and `Firebreak.Model.json/1` for the JSON string.

## Document envelope

```json
{ "schema_version": 1, "supervisors": [ <bundle>, ... ] }
```

- `schema_version` (integer) — always present, always first. Check it before
  trusting the document (see *Versioning*).
- `supervisors` — one **bundle** per supervisor in the analysis, in no guaranteed
  order. Match bundles by `supervisor`, not by position.

## Bundle schema

Each bundle is the complete lifecycle projection of one supervisor:

| Field | Type | Meaning |
|---|---|---|
| `supervisor` | module string | The supervisor this bundle describes. |
| `strategy` | string \| null | `"one_for_one"` / `"one_for_all"` / `"rest_for_one"` / `"simple_one_for_one"`, or `null` if unresolved. |
| `max_restarts` | int \| null | Restart intensity (the OTP default of 3 is filled in when the source didn't set it). |
| `max_seconds` | int \| null | Intensity window (default 5). |
| `parent` | module string \| null | The supervisor this one escalates to, or `null` for a root. |
| `children` | array | Children **in declaration order** (order matters for `:rest_for_one`). |
| `inbound_crossings` | array | Resolved coupling edges that cross *into* this subtree from outside it. |

**`children[]`** — each child:

| Field | Type | Meaning |
|---|---|---|
| `module` | module string | The child's module. |
| `restart` | string | *Effective* restart type: `"permanent"` / `"transient"` / `"temporary"`. |
| `type` | string | `"worker"` or `"supervisor"`. |
| `is_supervisor` | bool | Whether this child is itself a supervisor (a subtree, not a leaf). |

**`inbound_crossings[]`** — each crossing:

| Field | Type | Meaning |
|---|---|---|
| `from` | module string | The caller, *outside* this subtree. |
| `to` | module string | The target, *inside* this subtree. |
| `kind` | string | Coupling kind: `"call"`, `"cast"`, `"whereis"`, `"send"`, `"registry"`, `"ets"`, `"pubsub"`, `"pg"`. |
| `sync` | bool | Whether the caller blocks (`:call`/`:whereis`). **Only `sync` crossings justify an availability property** — a backend should model the caller hitting `:noproc` only here. |
| `in_init` | bool | Whether the crossing happens during the caller's `init/1` (a boot-order signal). |

## Serialization conventions

- **Modules and atoms** serialize as strings via Elixir's `Atom.to_string/1`, so a
  module is `"Elixir.Sample.Cache"`, an atom is `"one_for_one"`. Consumers in other
  languages should treat the `Elixir.` prefix as part of the name (or strip it for
  display).
- **`nil`** is JSON `null`.
- **Key order** is deterministic within the envelope (`schema_version` first); bundle
  and child keys may appear in any order — parse by key, never by position.
- Zero dependencies: the encoder is a tiny built-in (`Firebreak.JSON`). No decoder
  ships — an Elixir consumer should use `Firebreak.Model.build/1` directly.

## The design law: topology in, behaviour out

The IR carries **lifecycle / topology** facts — strategy, intensity, ordered
children, restart types, escalation parent, and the resolved cross-subtree edges.
It deliberately excludes **anything behavioural**: *when* or *why* a call fires,
message contents, protocol state, timing. For the lifecycle layer the correct
abstraction is "a crossing call may happen at any time," so topology is sufficient
to reason about restart blast radius and cross-tree availability. Protocol-level
facts belong to a runtime tool, not here. This boundary is what keeps backends pure
functions of the IR rather than re-derivations of the whole analysis.

## Versioning

`schema_version` is a single integer. It **bumps on a breaking change** to the
bundle shape — a removed, renamed, or retyped field. **Additive** fields (new
optional keys) do *not* bump it, so a consumer that ignores unknown keys keeps
working. A consumer should reject a document whose `schema_version` is greater than
the one it was written against.

Current version: **1**.

## A real document

From `mix firebreak --format model` on the test fixture (`test/fixtures/sample`):

```json
{
  "schema_version": 1,
  "supervisors": [
    {
      "parent": null,
      "supervisor": "Elixir.Sample.App",
      "strategy": "one_for_one",
      "max_restarts": 3,
      "max_seconds": 5,
      "children": [
        { "module": "Elixir.Sample.CacheSup", "restart": "permanent", "type": "worker", "is_supervisor": true },
        { "module": "Elixir.Sample.Worker", "restart": "permanent", "type": "worker", "is_supervisor": false }
      ],
      "inbound_crossings": []
    },
    {
      "parent": "Elixir.Sample.App",
      "supervisor": "Elixir.Sample.CacheSup",
      "strategy": "one_for_one",
      "max_restarts": 3,
      "max_seconds": 5,
      "children": [
        { "module": "Elixir.Sample.Cache", "restart": "permanent", "type": "worker", "is_supervisor": false }
      ],
      "inbound_crossings": [
        { "from": "Elixir.Sample.Worker", "to": "Elixir.Sample.Cache", "kind": "call", "sync": true, "in_init": false }
      ]
    }
  ]
}
```

Read it: `Sample.Worker` (under the root) synchronously calls `Sample.Cache` (under
`Sample.CacheSup`) — a `sync` crossing into `CacheSup`'s subtree. That single fact
is what lets a backend assert "a restart of `CacheSup` surfaces `:noproc` in
`Sample.Worker`."

## Writing a backend (the worked example)

A backend is a pure function from a bundle to whatever you want — a spec, a
diagram, a scenario, a check. The reference consumer is `Firebreak.Tla`
(`mix firebreak.spec`), which turns each bundle into a TLA+ lifecycle spec. Its
whole dispatch is a function of the bundle's `strategy` and crossings:

```elixir
def generate(%{children: [_ | _]} = bundle) do
  cond do
    # a :temporary child that's a synchronous target -> models a permanent :noproc
    temp_sync_target?(bundle)          -> temp_target_tla(bundle)
    bundle.strategy == :one_for_all    -> one_for_all_tla(bundle)
    bundle.strategy in [:one_for_one, :rest_for_one] -> budget_tla(bundle)
    is_nil(bundle.strategy)            -> {:unsupported, "no resolved strategy"}
    true                               -> {:unsupported, "strategy not templated"}
  end
end

def generate(_), do: {:unsupported, "no children to model"}
```

Two patterns to copy:

1. **Read only what the bundle gives you.** `max_restarts`/`max_seconds`   the restart budget; `strategy` → the blast shape; `children` order → who
   co-restarts; `parent` → the escalation target.
2. **Gate availability properties on `sync`.** `Firebreak.Tla` only emits the
   cross-tree `:noproc` property (`extStuck`) when `inbound_crossings` contains a
   crossing with `sync: true` — because only a synchronous caller blocks. Do the
   same: an async crossing is not an availability hazard.

The same shape works for any backend — a `lockstep` scenario, a sequence diagram, a
service-graph overlay. If you can express it as "for each supervisor, given these
facts," it's a pure function of this IR.

## Validating a document

`Firebreak.Model.valid?/1` (and `validate/1`, which returns reasons) checks a
projection — the `build/1` list form — against this contract, so a producer or
consumer can self-verify:

```elixir
bundles = Firebreak.Model.build(analysis)
Firebreak.Model.valid?(bundles)          # => true
Firebreak.Model.validate(malformed)      # => {:error, ["bundle 0: missing key(s) [...]", ...]}
```