Skip to main content

guides/persistence-and-resumability.md

# Persistence and resumability

## Pluggable behaviour

Persistence is a behaviour with two shipped adapters: an ephemeral/ETS default for
people who don't want a database, and an Ecto/Postgres adapter for durable,
resumable conversations. The framing is Oban-style: an append-only, ordered,
queryable log is the source of truth.

ReqLLM's `Context`, `Message`, and `ContentPart` structs all implement
`Jason.Encoder`, so **writes** to `jsonb` are free. Reads are not: ReqLLM
(verified v1.16.0) has no public JSON→struct decode path — `Response.decode_*`
decodes provider wire payloads, not persisted JSON. Agentix owns a small
deserializer for those three structs (worth offering upstream); budget for the
content-part variants and tool-call-args edge cases there.

The ETS table in the default adapter is owned by a supervisor-side owner process
(or is a named public table) — never by the agent, or kill-and-resume dies with
the very process it is supposed to survive.

## Schema (sketch)

```
conversations
  id              uuid pk
  settings        jsonb        -- provider/model/system prompt snapshot
  fsm_state       jsonb        -- the persisted machine state (see below)
  status          enum         -- active | suspended | idle | ended
  inserted_at / updated_at

events                          -- append-only, ordered by per-conversation seq
  id              uuid pk
  conversation_id uuid fk
  seq             bigint       -- monotonic per conversation
  type           enum          -- user_msg | assistant_msg | tool_call
                                --  | tool_result | suspension | resolution
  content         jsonb
  inserted_at

summaries                       -- derived compaction artifacts; double as snapshots
  id              uuid pk
  conversation_id uuid fk
  from_seq        bigint        -- span of events this summary covers
  to_seq          bigint
  content         jsonb
  version         text
  inserted_at

tool_calls                      -- so HITL suspensions survive a kill
  id              text pk       -- the tool_call_id (correlation key)
  conversation_id uuid fk
  executor        enum
  status          enum          -- pending | resolved | errored | expired
  args            jsonb
  result          jsonb
  inserted_at / resolved_at

model_calls                     -- OPTIONAL audit, off by default (see below)
  id              uuid pk
  conversation_id uuid fk
  turn_ref        bigint
  rendered_context jsonb        -- exactly what was sent to the model
  model           text
  usage           jsonb
  latency_ms      integer
  summary_version text          -- which compaction summary applied
  evictions       jsonb         -- what was stubbed/evicted this turn
  inserted_at
```

The `events` table is the canonical log. `tool_calls` is partly derived but kept
separate because its row-level mutable status (pending → resolved) is exactly what a
revived agent needs to reconstruct in-flight suspensions.

`summaries` is derived, never canonical (compaction must not touch the log — see
`05`), but it is load-bearing: revival reads "latest summary + events after its
`to_seq`," and the summarization reducer's state ("latest summary covers up to seq
N") is derived from this table rather than carried loose.

## Kill-and-resume

Both a new user message and a pending-call resolution enter through
`ensure_started(conversation_id)` (see `01`), so revival is automatic: kill the
agent on idle, and the next event of either kind starts it back up and rehydrates
from the log.

What you persist is **not just the message log**. A revived agent must also know it
was suspended on a human and which calls are still pending — so the `fsm_state`
snapshot carries the current state and the `pending` map. Reconstruct in-flight
`tool_calls` from their rows. Without this, a revived agent comes back not knowing
it owes the model a tool result.

### Resolved: the `fsm_state` payload shape

The persisted snapshot is deliberately small (the canonical definition lives in
`01`):

```
fsm_state = %{
  state:    :idle | :awaiting_input,   # only ever persisted in these two
  pending:  %{tool_call_id => %{executor, kind, prompt}},
  last_seq: N                          # log position the working set was built from
}
```

`last_seq` is what lets revival read "latest summary + events since that summary's
span" instead of replaying from zero (see snapshot cadence below). This one shape is
the join between the resolver (`03`), the kill/resume path, and the schema. And
because `suspension`/`resolution` are canonical event types, `fsm_state` is strictly
a **cache over the log** — rebuildable, never authoritative; on disagreement the log
wins (canonical statement in `01`).

Safe-to-suspend states: `idle` and `awaiting_input` are clean to snapshot and
evict. `streaming` and mid-`:server`-tool are not — there is no persisted
`streaming` or `executing_tools`. A kill in those states is **not** frozen and
resumed; recovery is from the log, and the two dangling shapes differ: a log ending
in a `user_msg` re-runs the LLM turn (safe — no side effects yet); a log ending in a
`tool_call` with no `tool_result` **re-dispatches that exact call with the same
`tool_call_id`** — never re-rolls the LLM, which would mint new ids and duplicate
side effects (canonical statement in `01`). Idempotency on `tool_call_id` also
covers the kill → revive → late-answer race.

## Timeout machinery

Suspension timeouts belong to the **persistence behaviour** (`schedule_expiry` /
`cancel_expiry`), not to core machinery — a per-agent timer dies with the agent,
and Oban cannot be a core dependency when persistence is pluggable (Oban requires
Ecto/Postgres; the default adapter is ETS). The Ecto adapter backs expiry with Oban
jobs ("expire pending tool call X if still unresolved"), which survive kill/revive.
The ETS adapter uses `Process.send_after` best-effort — acceptable, since ETS
doesn't survive a restart anyway. Oban stays an optional dep of the adapter, not of
Agentix.

## Resolved: scope on revival

`%Agentix.Scope{}` is runtime ambient state (current user, db handle) and is not
persisted. It is supplied **per entry call**: a LiveView resolution passes its own
scope; a webhook or job passes what it has. Timeout-driven resolutions (the expiry
job) run with a documented **system scope** — a tool that needs a real user scope
and receives the system scope fails as a tool-error rather than guessing. Apps that
need more can stash a serializable scope seed in `conversations.settings`, but that
is app-level composition, not library machinery.

## Context vs message storage

The principle that resolves the "store them together?" question: **the message log
is canonical; the rendered context is derived.**

- Messages (what the user said, what the model said, tool calls and results) are
  always logged. They define the logical conversation and are what replay
  reconstructs.
- The resolved context a hook injects for a given turn (retrieval hits, memory) is
  a per-turn artifact — a function of the message plus external state at that
  instant. It does **not** go in the message history. Storing it inline conflates
  "what the user said" with "how we happened to augment that turn," bloats the
  canonical record, and lies on replay, because re-running the conversation should
  generally re-derive fresh context, not resurrect a stale snapshot.

So: messages always logged; resolved context logged **separately and optionally**,
keyed by turn, in `model_calls`. That optional table is also where summarization
version and evictions land (see `05`), so that when someone reports "it forgot the
address I gave it," you can tell whether it was compacted out or the model ignored
it. The tradeoff "optional" buys is reproducibility: without recording exactly what
was rendered, you cannot perfectly reconstruct why the model said what it said —
valuable for evals and debugging, but it costs storage and has privacy
implications, so it is off by default and switched on when evaluating.

## Resolved: snapshot cadence

**Event-sourced truth + snapshots as an optimization — and the compaction summaries
*are* the snapshots.** Pure replay-from-zero gets slow on long conversations;
snapshot-only loses auditability. But the prefix summaries compaction already
produces (see `05`) are prefix snapshots of the conversation, keyed by the span of
events they cover. So revival reads "latest summary + events since its span" (using
`fsm_state.last_seq`) rather than replaying everything. One mechanism serves both
compaction and snapshotting — no separate snapshot table or cadence to tune.

## Resolved: `model_calls` GC

TTL-based, configurable, **off by default.** It is the fastest-growing, least-
permanent table and exists only for debugging/evals, so a simple time-based drop (or
a per-conversation row cap) is enough. Since the audit table itself is off by default
(below), there is usually nothing to GC; when it is switched on for an eval run, the
TTL keeps it from growing without bound.

## Open questions

- Multi-node persistence story (follows the addressing question in `01`; out of
  scope for v0).