Skip to main content

guides/architecture.md

# Architecture

## Process topology

One agent process per conversation. It is a `:gen_statem`, not a `GenServer`,
because a turn is genuinely a state machine and several features (human-in-the-loop
suspension above all) are state transitions rather than flags. States:

- `idle` — no turn in flight.
- `preparing` — running pre-message hooks, assembling and reducing context.
- `streaming` — an LLM turn is streaming via a task.
- `executing_tools` — one or more tool tasks in flight.
- `awaiting_input` — suspended on external resolution (human or client tool).
- `terminating` — shutting down (idle eviction or explicit stop).

Supporting processes: a `Registry` for addressing agents by `conversation_id`, a
`DynamicSupervisor` for spawning them, and `Phoenix.PubSub` for broadcasting live
events to subscribers. The agent knows nothing about LiveView — it publishes to a
per-conversation topic and any number of subscribers consume.

The agent holds only the **working set**, never the full history — see
`07-memory-and-sizing.md`. Footprint is flat in conversation length and linear in
the active agent count.

## Two event planes

There is not one event stream; there are two, with different masters. Conflating
them is the trap behind "the event contract."

- **Canonical events** are written to the log **in-process, synchronously**, as part
  of the agent's transition: `user_msg`, `assistant_msg` (final), `tool_call`,
  `tool_result`, `suspension`, `resolution`. They are the source of truth and must
  never depend on PubSub delivery — a dropped broadcast cannot lose history.
- **Live events** are broadcast over PubSub for observers and include ephemeral
  things never logged: token deltas, thinking deltas, tool progress, state
  transitions. If a subscriber misses them, nothing is lost; durable state is
  reconstructable from the log.

This is why a renderer is *snapshot + live tail* (see `06`): read the log to rebuild
durable assigns on mount, then subscribe to live events for the delta.

## The non-blocking principle

The agent process never blocks on I/O. The LLM stream runs in a monitored task that
sends chunks back as messages; tool calls run in their own monitored tasks and
report results back as messages. The agent's mailbox preserves per-turn ordering,
and because the agent is never parked inside an HTTP call it can always handle
`cancel`, `inspect`, and (later) queued input mid-turn. The only thing that happens
inside the agent's own execution is bookkeeping: append to the log, update the
pending-calls map, broadcast, decide the next transition.

## ensure_started — the unifying door

Callers never hold an agent pid. Everything addresses by `conversation_id` through a
single `ensure_started(conversation_id)` that returns the live process (via its
Registry `via` tuple) or starts it under the `DynamicSupervisor`, rehydrating from
persistence on the way up. Both a new user message and a pending-call resolution
enter through this same door, which is why resumability falls out for free.

## Persisted fsm_state (deliberately small)

```
fsm_state = %{
  state:    :idle | :awaiting_input,   # only ever persisted in these two
  pending:  %{tool_call_id => %{executor, kind, prompt}},
  last_seq: N                          # log position the working set was built from
}
```

`kind` is `:approval | :elicitation | :client_exec` — the discriminator the
renderer switches on (see `06`); executor alone can't distinguish a gated
`:server` call from an elicitation.

There is no persisted `streaming` or `executing_tools` state. This one shape
resolves "what to persist," "which states are safe to evict," and "what a
suspended-on-human agent carries" at once.

Because `suspension` and `resolution` are canonical event types, `fsm_state` is
strictly a **cache over the log** — rebuildable from it, never authoritative. On
any disagreement the log wins. This removes the atomicity problem between the log
append and the snapshot write (the ETS adapter has no transactions).

## Resolved: suspend-safe states and mid-turn kills

Only `idle` and `awaiting_input` are snapshotted and evictable — they are clean. A
kill in `streaming` or mid-`:server`-tool is **not** frozen and resumed; mid-turn
durability is "recover from the log," not "freeze the stream." The two dangling
log shapes recover **differently**:

- **Log ends in a `user_msg`** (killed while streaming, nothing dispatched) —
  re-run the LLM turn. Safe; no side effects happened yet.
- **Log ends in a `tool_call` with no `tool_result`** (killed mid-tool) — do
  **not** re-roll the LLM: it would mint new calls with new ids and re-fire side
  effects (`send_email` twice). **Re-dispatch that exact call, same
  `tool_call_id`.** The library passes the id through; only the user's tool can
  honor it — document loudly that side-effecting `:server` tools should use
  `tool_call_id` as their idempotency key.

## Cancellation

"Stop generating" works from any non-idle state, not only `streaming`:

- From `streaming` — kill the streaming task **and** tear down the HTTP
  connection. ReqLLM's stream response carries cancellation as a captured
  closure, invoked `stream_response.cancel.()` (there is no module-level
  `cancel/1`); verify in tests the socket actually closes. Records a
  cancelled/partial assistant turn.
- From `executing_tools` — `Task.shutdown` the tool tasks and log synthesized
  `"[cancelled]"` tool results so every `tool_call` keeps its paired result —
  providers reject a rendered context with an orphaned call (see `05`).
- From `awaiting_input` — resolve all pending calls as "user cancelled"
  tool-errors. This is also the escape hatch from the single-in-flight composer
  lock: a user parked on an elicitation they don't want to answer cancels the
  turn instead of waiting out the timeout.

## Resolved: idle eviction (two tiers)

- **Short idle, parked in `awaiting_input`** → **hibernate**. Compacts the heap via
  GC while keeping the process alive and addressable, so the human's eventual
  resolution arrives without a DB rehydrate.
- **Long idle** → persist `fsm_state` and **terminate**, dropping from memory.
  Revival through `ensure_started` is cheap, so terminating frees more than
  hibernating holds.

## Concurrent-message policy

v0 is single-in-flight: a user message arriving mid-turn is rejected (the default UI
disables the composer). Post-v0 we expect two caller-chosen modes — direct-send
(interrupt/run-alongside) and queue (serialize). The state machine should handle
"user input arrived in a non-idle state" explicitly per state to leave room for this.

## Crash semantics

A crashing tool task is isolated by its task boundary and surfaces as an error result
fed back to the model, never an agent crash. If the agent crashes, the supervisor
restarts it and it rehydrates from the log via `ensure_started`. The log is the
recovery boundary.

## ReqLLM operational notes (verified v1.16.0)

- Streaming pools are **HTTP/1-only by default** (a known Finch ALPN bug with
  mixed-protocol large bodies) — revisit pool config before chasing streaming
  concurrency at scale.
- Don't reach into ReqLLM struct internals: the substrate already migrated once
  within 1.x (TypedStruct → Zoi). Treat public constructors as the contract.

## Remaining open

- Multi-node addressing (Registry is local; a distributed registry such as Horde or
  syn would be needed for clustering). Out of scope for v0 — the invariant to protect
  is that `ensure_started` stays the *only* addressing point, so this becomes a
  one-file change later.