# Changelog
## v0.6.1 — 2026-06-27 — "Audit follow-up"
A second logic audit after the v0.6.0 hardening release surfaced a follow-up
round of findings, in two recurring themes: error paths the v0.6.0 fixes added
whose callers were never updated to consume them, and durable backstop state
that was written but never read by any poller — plus a few concurrency windows
in the cancel/continue/claim paths. All are fixed.
### Fixes
- `Continuum.signal/4` (and `Continuum.Test.inject_signal/4`) on Postgres now
returns `{:error, :not_found | :run_terminal}` instead of crashing the
caller with a `MatchError` when the target run is missing or already
terminal. The durable path now matches the in-memory adapter and the
documented `:ok | {:error, term()}` contract.
- Signal delivery locks and follows the `continue_as_new` chain tip inside its
transaction, so a signal can no longer be stranded in a terminal run's
mailbox when a continuation commits concurrently — it lands before the
successor's signal migration or follows through to the live tip.
- The cancel cascade locks descendant runs generation-by-generation
(`FOR UPDATE` at each level) instead of snapshotting the whole subtree under
only the root's lock, and `continue_as_new!` now takes its parent's row lock
first (matching the cascade's parent-before-child order). A run created by a
concurrent `start_child!`/`continue_as_new!` can no longer escape a cancel of
its ancestor, and the lock ordering removes an AB-BA deadlock window.
- A pending durable cancel request (`cancel_requested_at`, recorded for an
unreachable-but-leased owner) now survives `continue_as_new` — the successor
inherits it — and is honored at claim/resume time, not only on the owner's
next lease heartbeat. `Continuum.cancel/2`'s local path no longer exits the
caller if the engine stops mid-call, and `handle_call(:cancel)` rescues a
`JournalError` (lease rotated under it, run already cancelled durably)
instead of crashing the engine.
- Builtin activity task-lease fencing now includes `attempt` in its CAS
(`lock_and_validate_activity_task!` and both completion/retry update_all
clauses). A zombie worker from a previous attempt can no longer commit or
requeue over a live re-claim of the same task and corrupt attempt accounting.
- The activity task-lease heartbeat retries transient DB errors with backoff
instead of stopping permanently on the first one; only a genuine CAS miss is
terminal. A DB blip mid-activity no longer expires the lease and fails an
otherwise-healthy long-running activity.
- Unknown-version runs are backed off via `next_wakeup_at` when their lease is
released, so an incapable node no longer hot-loops claiming and releasing
them at the dispatcher poll rate (which could starve real runnable work, and
was unbounded on single-node deployments).
- The `adopt_lease` handoff is confirmed before the dispatcher returns,
falling back to a full resume when the live engine is already gone — closing
the window where a freshly rotated lease was owned by nobody and the run
stalled a full TTL.
- Per-workflow `snapshot_threshold:` now actually triggers automatic snapshots
through `Snapshotter.maybe_snapshot` (the instance/app-level gate no longer
short-circuits the per-workflow resolution before it is read). A workflow
that opts in per the documented resolution order now snapshots even when the
app-level threshold is `:infinity`.
- An undecodable (future-format) snapshot falls back to full event replay
instead of crash-looping the engine. `latest_snapshot` filters on
`format_version`, so a mixed-version rolling deploy where a newer node wrote
a newer snapshot format degrades gracefully on older nodes.
- The in-memory adapter rejects signals to terminal runs with
`{:error, :run_terminal}`, matching Postgres (previously it buffered the
payload and reported `:ok`).
- The determinism scanner runs the helper-call and dynamic-receiver checks on
`use Continuum.Pure` modules too, so a Pure helper can no longer launder an
untrusted or dynamic call into the trusted set. `Function.capture/3` and
`:erlang.apply` join the denylist, and anonymous-function invocations whose
callee is a runtime value (`input.handler.()`, `Function.capture(...).()`)
now warn — the closure-invocation forms that previously slipped past the
`m.f(...)` dynamic-receiver check (inline `fn`, captures, and bare local
closures stay quiet).
## v0.6.0 — 2026-06-11 — "Audit hardening"
A full-library logic audit (activity liveness, replay-path agreement, the
determinism scanner's negative space, identity across chains/nodes/versions,
and signal/cancel/await consistency). See
`guides/migrations/MIGRATING_v0_5_1_to_v0_6.md` for the upgrade guide.
### Fixes
- The determinism scanner resolves aliases before the denylist lookup:
`alias DateTime, as: D` + `D.utc_now()` is now a hard error (previously a
generic warning), and a user's own module aliased *as* `DateTime` is no
longer a false positive. `:erlang.system_time/monotonic_time/
unique_integer/make_ref/self/send` and `:os.system_time/timestamp` join
the denylist alongside their already-banned `System.*`/`Kernel` wrappers,
and dynamic-receiver calls (`m.f(...)`) in workflow code now emit a
warning since they cannot be statically checked.
- Workflow code that swallows Continuum's suspend throw (a `try ... catch`
arm around an effect — the throw happens *after* the pending effect is
journaled) no longer corrupts the run's history. The engine records every
control throw in the run context; if execution continues past it, the
next effect — or the engine, on a normal return — fails the run with the
new `Continuum.SuspendLeakError`. The scanner additionally warns at
compile time on `catch` arms in workflow clauses, with a rescue-only
remediation hint (re-throwing the control tuples stays supported).
- `Continuum.AstCheck` now inspects unqualified calls: `apply/2,3`,
`spawn/spawn_link/spawn_monitor`, `send/2`, `self/0`, `make_ref/0`, and
`node/0,1` are rejected in workflow code (previously only the
`Kernel.`-qualified spellings nobody writes were caught), `receive`
blocks are rejected outright, and unqualified calls are resolved against
the imports in scope — `import DateTime` followed by a bare `utc_now()`
is now a compile error like the qualified call. In-body `import`
directives are tracked too, honoring literal `only:`/`except:` lists.
- `child_started` and `run_continued_as_new` event replay now validate the
journaled workflow module and input hash against what the workflow code
commands, raising `Continuum.ReplayDriftError` on mismatch (previously a
changed child workflow or input replayed silently with the old child's
run id on the event path while the snapshot path raised). Snapshots
compacted from now on carry the hashes so both replay paths agree;
pre-existing snapshots skip the new check.
- Replaying a history that ends in a pending `signal_awaited` no longer
hard-codes a Postgres mailbox lookup: on non-Postgres journals the replay
suspends, so golden histories ending at a pending await can be replayed
offline with `Continuum.Test.replay/4`.
- In-memory inline activities (`Continuum.Test.start_synchronous/3`) now
rescue exceptions with the same normalization as the durable worker and
hand the workflow an `{:error, error}` value instead of crashing the run.
The canonical saga path ("payment fails → `compensate_all`") now takes
the same control path in tests as in production.
- In-memory signal delivery now buffers per run (mirroring the
`continuum_signals` mailbox) instead of appending `signal_received` at the
journal tail. Signals arriving early or out of order wait for their
matching `await signal` — previously they produced a permanent
`ReplayDriftError` that the identical sequence on Postgres did not — and
the consumed event carries the await's command identity. Signaling a
nonexistent in-memory run still returns `{:error, :not_found}`.
- The journal adapter is now resolved through the runtime instance — one
source of truth for `Continuum.start/signal/cancel/await`. Previously the
engine defaulted new runs to the in-memory journal even with
`config :continuum, journal: ...Postgres` set (the README quickstart
therefore started a non-durable run while `await` polled Postgres), and
named instances ignored the config entirely. Named instances given a
`:repo` now default to the Postgres journal (override with the new
`journal:` option of `Continuum.children/1`); the SignalRouter's
LISTEN gating follows the instance's journal too.
- Activity task leases are extended to cover the activity's configured
timeout (plus a margin) at execution start. Previously the claim TTL
(default 30s) was the effective execution ceiling: any activity running
longer could never commit its result and the run wedged.
- The activity dispatcher's poll now requeues `leased` tasks whose lease has
expired (emitting `[:continuum, :activity_dispatcher, :requeued]`).
Previously this rescue only ran at boot, so a worker crash stranded the
task — and its run — until the node restarted.
- Activity workers no longer crash when a journal write is fenced out (run
lease rotated, task lease expired or taken over, run already terminal).
The task is released for re-execution — or discarded when the run is
terminal — and `[:continuum, :activity, :fenced]` is emitted. The fencing
itself is unchanged: the stale write is still rolled back.
- Crash requeues (boot recovery and the dispatcher sweep) now consume an
attempt, exactly like an execution that returned an error. A task whose
attempt exceeds `max_attempts` is failed with `:attempts_exhausted`
without re-executing. **Behavior change:** an activity with the default
`max_attempts: 1` whose worker/node dies mid-execution now fails instead
of silently re-running its side effects; raise `max_attempts` (and supply
an `idempotency_key/1`) for crash-resilient activities.
- Activity task leases are now heartbeated while the activity executes
(TTL 30s renewed every 10s; `:task_lease_ttl_seconds` /
`:task_lease_renew_ms`) instead of one-shot-extended to timeout + margin.
A crashed worker's task expires within ~one TTL and the sweep rescues it
promptly, even for long-timeout activities. `mix continuum.audit` now
reports `expired_leased_activity_tasks` as the matching operator signal.
- Signals, cancel, and await now follow `continue_as_new` chains to the
live tip. Signaling the chain-root id delivers into the current
incarnation's mailbox (previously a silent loss into the dead root's),
cancelling the root cancels the tip, and `Continuum.await/3` follows the
chain to the final terminal result — the internal `{:continued, run_id}`
sentinel is never exposed. When a run continues, undelivered mailbox
signals and live unawaited children move to the successor (so the cancel
cascade still reaches them; the successor cannot *await* inherited
children — await every child you need a result from before continuing).
Successors also inherit `namespace` and `attributes` (previously reset to
defaults on first continuation), as do child runs from their parent.
- `continue_as_new` no longer pins the successor to the predecessor's
`version_hash`: the successor starts with empty history, so it is stamped
with the workflow's currently loaded version and chains pick up deploys.
An unknown version is now a per-node fact: the engine releases the lease
and leaves the run `suspended` for a capable node to claim, instead of
marking it `stuck_unknown_version` globally (which one stale node could
do to a new-version run during a rolling deploy, unrecoverably).
Registering a version (`VersionRegistry.upsert_instance/2`, run at boot)
flips legacy stuck rows back to `suspended`.
- Cancel reaches runs hosted on other nodes: it forwards through the same
`:pg` group as wake (as a call, so the caller gets the real result). The
durable fallback distinguishes `{:error, :not_found}`,
`{:error, :owned_elsewhere}`, and `{:error, {:run_not_active, state}}`
(previously all `:not_found`). For an owner that is leased but
unreachable, the request is recorded in the new
`continuum_runs.cancel_requested_at` column and honored by the owning
engine on its next lease heartbeat.
- Fresh durable runs are inserted already leased, in one transaction.
Previously the insert committed before the lease acquire, so another
node's dispatcher could claim the run in the window and the original
`Continuum.start` returned an error for a run that was actually
executing. The dispatcher also skips runs whose engine is alive in the
local registry, and when a claim races an engine registration it hands
the rotated token to the live engine (`Engine.adopt_lease/4`) instead of
fencing it out for a full lease TTL.
- `start_child` enforces `:max_child_depth` at creation time (failing the
run loudly) so descendants can no longer outrun the cancel cascade's
depth bound; if the cascade still meets deeper legacy descendants it
logs and emits `[:continuum, :run, :cancel_cascade_truncated]` instead
of silently leaving them running.
- **Behavior change:** cancellation now produces a real terminal
`cancelled` run state (previously `failed` with error `:cancelled`).
`cancel_run!` is the single broadcaster of one canonical
`{:run_finished, run_id, :cancelled, :cancelled}` message — including
for cascade-cancelled descendants, whose awaiters previously blocked for
their full timeout — and `await` returns
`{:error, %{state: :cancelled, ...}}` from both the broadcast and poll
paths (previously the two disagreed). A child that legitimately *failed*
with the user error term `:cancelled` is now classified as a failure by
its parent's await, not a cancellation. Legacy `failed` + `:cancelled`
rows still display, await, and query as cancelled. The in-memory journal
stores the same canonical state.
- **Behavior change:** `Continuum.signal/3,4` returns
`{:error, :not_found}` / `{:error, :run_terminal}` for runs that do not
exist or are terminal (previously `:ok`, silently accepting a signal
nothing could ever consume), matching the in-memory adapter.
- Journal write rejections are now a structured
`Continuum.Runtime.JournalError` (operation + rollback reason) instead of
`RuntimeError` message text; the engine, activity workers, and timer
wheel classify failures by pattern matching. **Behavior change:** code
rescuing `RuntimeError` from journal operations must rescue
`Continuum.Runtime.JournalError` instead.
- A transient database failure while journaling a run's completion,
suspension, or mid-replay effect no longer marks the run `failed` with
the DB exception as its "error": the engine crashes and crash-and-resume
replays and finishes the run. Terminal transitions additionally CAS on
the run still being active, so a late `fail!` can never flip a
`completed` run.
- `fire_timer!` validates run state as well as lease, so a timer claimed
just before a cancel committed can no longer append `timer_fired` into a
terminal run's history; the wheel drops the rejection cleanly. Cancel
also clears the run's lease.
- `Continuum.set_attributes/3` merges in SQL (`attributes || $2::jsonb`),
so concurrent disjoint merges can no longer silently drop each other's
keys.
- `Continuum.side_effect/1` producer identity no longer includes
per-compilation anonymous-function artifacts, so recompiling a helper
module (adding an unrelated function) no longer kills every in-flight run
replaying through it. The helper-module caveat (call-site line identity
without version-hash protection) is documented on `side_effect/1`.
**Behavior change:** histories journaled through the *bare-producer*
`Effect.run/2` form (not the `side_effect` macro) before this change
replay-break once across the upgrade.
- Lease, retry-backoff, and signal-timeout arithmetic uses the database
clock end to end (expiry comparisons in SQL; retry `available_at`
computed in the UPDATE), so app/DB clock skew can no longer shrink or
stretch effective TTLs and timeouts.
- The SignalRouter retries a failed LISTEN start (previously a node could
stay silently deaf to signals and parent wakeups forever), scans for
undelivered signals on (re)connect and every 30s as a poll backstop for
parked engines, and remote wakes are cast to every `:pg` member instead
of only the first.
- The determinism scanner normalizes pipes before scanning (`x |>
send(:msg)` is now rejected as `send/2`), warns on chained dynamic
receivers (`input.mod.fun(x)`) and captures of dynamic modules
(`&m.f/1`), and warns on `catch` arms in `Continuum.Pure` helpers. The
uncompensated-activity check runs once per module at compile end, so
activities in other clauses or private helpers are caught, and call
sites with non-literal opts are no longer falsely flagged.
- `Continuum.Test.inject_signal/4` delivers through the SignalRouter, so
injected in-memory signals are consumed by the live await and journal
`signal_received` with its command identity — injected signals now
exercise the same drift detection as production deliveries. Paranoid
re-replay handles `continue_as_new` runs (verifying the journaled
continuation) instead of skipping them silently, and its docs state
precisely what the re-replay checks.
- The Snapshotter resolves its journal through the runtime instance, and
snapshot triggers carry the journal that wrote the events — a durable run
on an instance whose default journal is in-memory snapshots into
Postgres.
### Migrations
- Added `continuum_runs.cancel_requested_at timestamptz` (delta migration;
`mix continuum.gen.migration` includes it for new installs).
## v0.5.1 — 2026-06-04 — "Oban activity executor"
### New surfaces
- Added `Continuum.Oban`, an optional activity executor that routes Continuum
activity tasks through a host-operated Oban queue. Continuum still owns the
durable task table, retry policy, idempotency, timeout handling, and
completion CAS; Oban is used only as the execution pool.
- `Continuum.children/1` and the default application instance now accept
`activity_executor: {:oban, queue: :continuum_activities}` when the host app
depends on and supervises Oban.
### Observability
- Activity and compensation telemetry metadata now includes `executor:
:builtin | :oban`; Oban-backed activity attempts also include `oban_job_id`.
### Documentation
- Added `guides/oban-executor.md`.
- Added `guides/migrations/MIGRATING_v0_5_to_v0_5_1.md`.
## v0.5.0 — 2026-06-02 — "Production at scale"
### New surfaces
- Cluster-aware wake routing. Continuum starts `:pg` scope `:continuum`, engines
join by `{instance, run_id}`, and wakes forward to remote owners when the run
is not local. The lease and fencing token remain the write authority.
- Added `mix test.cluster` and a `:peer`-based cluster harness covering dispatch
races, lease stealing, and activity worker node death against one Postgres.
- Added namespaces on `continuum_runs`. `Continuum.start/3` accepts
`namespace:`, query/list paths default to `"default"`, and single-run
operations stay globally keyed by `run_id`.
- Added search attributes on `continuum_runs`, plus `Continuum.query/1,2`,
`Continuum.get_run/2`, and `Continuum.set_attributes/3`.
- Added `mix continuum.audit --repo MyApp.Repo [--format json] [--strict]` for
loaded workflow versions, stale patch marker verdicts, and stuck
unknown-version runs.
- The determinism scanner now rejects `:pg.*`, `:rpc.*`, and `:erpc.*` in
workflow code.
### Migrations
- Added `continuum_runs.namespace text NOT NULL DEFAULT 'default'`.
- Added `continuum_runs.attributes jsonb NOT NULL DEFAULT '{}'`.
- Added GIN and namespace/state indexes for attribute and tenant-scoped
queries.
### Documentation
- Added guides for clustering, namespaces, search/query, and auditing.
- Added `MIGRATING_v0_4_to_v0_5.md`.
- Updated the example orders app with the v0.5 migration and smoke coverage for
two namespaces plus `Continuum.query/1`.
### v0.5 decisions
- `Continuum.Oban` activity routing is deferred to v0.5.1. The v0.5
milestone ships the built-in activity runner unchanged so the cluster,
namespace, query, and audit surfaces can tag without introducing a second
execution adapter.
- `Continuum.AshAi` is deferred until a lighthouse adopter is engaged.
- The Observer replay-stepping debugger is formally cut from v0.5 rather than
carried as another release's implicit nice-to-land.
### Benchmarks
- Pre-v0.5 baseline on 2026-06-02 from `MIX_ENV=test mix run
bench/replay_hot_path_bench.exs`: raw replay 111 ms / 8.88 us per event over
12,500 events.
- Final v0.5 verification on 2026-06-02: raw replay 86 ms / 6.88 us per event
over 12,500 events; snapshot replay 78 ms over the compacted prefix.
## v0.4.0 — 2026-05-31 — "Hardening & ergonomics"
### Changed
- Replay contexts now keep an indexed in-process history (`:array`) for cursor
reads and live-tail appends. This removes the replay hot path's repeated
`Enum.at/2` scans and `history ++ [event]` list rebuilds while leaving the
journal append path unchanged.
- Snapshot payloads now use a versioned `{:continuum_snapshot, 1, snapshot}`
envelope. Legacy unversioned v0.2/v0.3 snapshot blobs still decode as format
v1, and unsupported future formats raise a clear `ArgumentError` instead of
failing as a raw term decode.
- `use Continuum.Workflow` accepts `snapshot_threshold: positive_integer |
:infinity`. The snapshotter resolves per-workflow threshold first, then
runtime/app config, then `:infinity`.
- Added `mix continuum.gc_versions --repo MyApp.Repo`, a dry-run-by-default
cleanup task for `continuum_workflow_versions`. It deletes only with
`--execute`, preserves loaded workflow hashes, and treats running,
suspended, and stuck-unknown-version runs as pins.
- Added `mix continuum.archive_continued_chains --repo MyApp.Repo --older-than
Nd`, a dry-run-by-default deletion task for expired non-tail
`continue_as_new` cycles and their dependent rows.
- `compensate_all(mode: :parallel)` schedules all pending compensation tasks
before suspending, then resumes once every scheduled compensation has a
terminal journal event. `compensate: :none` explicitly opts an activity out of
the new missing-compensation compile warning.
- `use Continuum.Workflow` now generates a hidden `V_<hash>` entrypoint module
for the compiled workflow body. Public workflow modules stay as the start
target, while durable Postgres runs execute and resume through the generated
hash-specific entrypoint.
- Added v0.4 migration and operations documentation, plus an example
`SubscriptionFlow` that combines `continue_as_new` with a per-workflow
snapshot threshold.
### Migrations
- Added `continuum_snapshots.format_version smallint NOT NULL DEFAULT 1`, plus
`continuum_runs_correlation_completed_idx` for the v0.4 continued-chain
archival task.
### Benchmarks
- `mix run bench/snapshot_bench.exs 10000` on 2026-05-31 reported raw replay
21 ms, snapshot replay 16 ms, and a 1.3x speedup for 10,000 side-effect
events after indexed history landed. The old 7.2x snapshot advantage was
largely measuring inefficient raw replay; v0.4 formally accepts the lower
speedup because raw replay is now much faster and snapshot payload format
stability is the user-facing graduation.
- Added `bench/replay_hot_path_bench.exs`. At 10,000 logical mixed operations
(12,500 events across side effects, activities, patch markers, and saga
compensations), current raw replay is 89 ms / 7.17 us per event; snapshot
replay is 89 ms over the compacted prefix.
## v0.3.0 — 2026-05-29 — "Real workflows"
### New surfaces
- **`continue_as_new/1`.** A tail-call continuation for long-running /
cron-style workflows: completes the current run as
`result: {:continued, next_run_id}` and starts a fresh run on the same
workflow with new input, keeping per-run history bounded. The whole chain
shares a `correlation_id` (the chain root's id) and each run records its
`continued_from_run_id` predecessor; a continued *child* keeps its
`parent_run_id`, and a parent's `await_child` follows the chain forward to the
terminal run's real result (never an intermediate `{:continued, _}`). Throws a
distinct `:continuum_continued_as_new` sentinel so the engine stops cleanly
instead of re-entering the workflow. New event `run_continued_as_new`,
telemetry `[:continuum, :run, :continued_as_new]`. Requires the Postgres
journal.
- **Parent/child workflows.** Compose workflows out of child runs:
- `await child Mod.run(input)` — start a child synchronously, suspend, and
return its result.
- `start_child Mod, input, opts` — start a child asynchronously, returning a
`%Continuum.ChildRef{}` (`opts` accepts `id:` for a parent-scoped key).
- `await_child(ref)` — suspend until that child terminates.
Child run ids are derived deterministically from the parent run id, the start
command id, and any `id:` option, so a parent never starts two children on
replay. Children carry their own lease and run independently; when a child
reaches a terminal state it sets the parent's `next_wakeup_at` and emits
`pg_notify('continuum_run_wake', parent)` in the same transaction. The
existing `SignalRouter` now also `LISTEN`s `continuum_run_wake` and wakes a
local parent engine — **no new runtime process**. Cancelling a parent cascades
(bounded by `config :continuum, max_child_depth: 10`) to all in-flight
descendants, clearing their leases so no post-cancel child events can be
appended. New events `child_started` / `child_completed` / `child_failed` /
`child_cancelled`, telemetry `[:continuum, :child, :started | :completed |
:failed]`, and four nullable `continuum_runs` columns (`parent_run_id`,
`parent_command_id`, `correlation_id`, `continued_from_run_id`). Child
workflows require the Postgres journal.
- **Compensation / saga DSL.** `activity/2` accepts a `compensate:` `{m, f, a}`
option; a successful (`{:ok, value}`) compensated activity returns
`{:ok, %Continuum.ActivityRef{}}` carrying the compensation handle (activities
*without* `compensate:` are unchanged and still return a bare term). Two new
workflow macros roll work back:
- `compensate/1` — run one activity's compensation (by its `ActivityRef`) and
drop it from the pending set so `compensate_all/0` can't double-run it.
- `compensate_all/0` — run every pending compensation in LIFO order (most
recent first); ideal in a `rescue` clause.
Compensations flow through the same activity worker, retry policy, timeout,
idempotency side-table, and lease-fencing path as ordinary activities, and a
compensation that fails terminally journals `compensation_failed` without
killing the run. `Continuum.unwrap/1` recovers an activity's raw return from a
ref. New events `compensation_scheduled` / `compensation_completed` /
`compensation_failed` and telemetry `[:continuum, :compensation, :scheduled |
:completed | :failed]`. Compensations are captured by snapshots.
- `Continuum.patched?/1` is now a real, journaled patch marker (was a `false`
stub). It is a macro (capturing `__CALLER__` for a stable command identity);
the first call at a source line journals a `patched` event with `value: true`
and returns `true`, and the value replays on resume. Runs replaying history
recorded *before* the patch line return `false` without consuming an event,
keeping in-flight runs on the old branch. `patched?/1` is the only effect that
may return without advancing the replay cursor, and the non-advance is keyed
on `command_id` lookahead so independent patch calls don't interfere. Patch
decisions are captured by snapshots. New telemetry `[:continuum, :patched,
:hit]`. Modules calling it must `require Continuum` (`use Continuum.Workflow`
does this); outside a workflow it returns `false`.
- `Continuum.Test.Paranoid` — the `--paranoid` re-replay safety net. Enable it
for a whole run with `CONTINUUM_PARANOID=1 mix test` (or
`config :continuum, :paranoid_replay, true`); the default is off so ordinary
`mix test` stays fast. When enabled, a telemetry handler re-replays every
completed in-memory run from its journaled history and flags any drift or
differing result. `verify_run!/4` is the strict, raising contract for
asserting a specific run re-replays identically; `assert_histories_match!/2`
compares two histories on `(event_type, decoded_payload, command_id)`,
excluding DB-stamped fields.
- `Continuum.VersionRegistry` now resolves durable `(workflow, version_hash)`
pairs to loaded workflow entrypoints. The hot-path registry is backed by
`:persistent_term`; a short-lived boot task upserts loaded workflow versions
into Postgres for each Continuum instance.
- `use Continuum.Workflow, workflow: LogicalWorkflow` registers a concrete
module as a hash-specific entrypoint for a logical workflow. This is the
v0.3 compromise entrypoint strategy: keep old version modules loaded and
point new versions at the same logical workflow.
### Migrations
- Added `continuum_workflow_versions`, keyed by `(workflow, version_hash)`,
with the loaded `entrypoint` module and `registered_at` timestamp.
- `20260801000000_continuum_v0_3` adds four nullable `continuum_runs` columns
(`parent_run_id`, `parent_command_id`, `correlation_id`,
`continued_from_run_id`) plus partial indexes on the non-null ids. Old
`SELECT *` code keeps working; existing rows are backfilled with
`correlation_id = id`.
### Behavior changes operators should know about
- Resuming Postgres-backed runs now dispatches through the run row's journaled
`workflow` and `version_hash`; it no longer trusts the latest logical module.
- Runs whose journaled workflow version cannot be resolved are marked
`:stuck_unknown_version` instead of being replayed through possibly changed
code.
- Starting a durable run now fails loudly if the workflow module does not
expose `__continuum_workflow__/0`.
### Observability
- The Observer run-detail timeline now colours `compensation_*`, `child_*`,
`run_continued_as_new`, and `patched` events, and the run header links to the
`parent_run_id` and the "continued from / continues to" runs of a
`continue_as_new` chain.
- `Continuum.OpenTelemetry` adds a `continuum.compensation_attempt` span and
records child-workflow and `continue_as_new` events as breadcrumbs on the
originating run-attempt span (a child's own work is captured by its own run
spans, correlated by run id).
### Benchmarks
- `MIX_ENV=test mix run bench/snapshot_bench.exs` on 2026-05-29 with 10,000
side-effect events reported raw replay 100 ms, snapshot replay 13 ms, and a
7.2x replay speedup. The v0.3 re-bench does not close the >=10x snapshot
target; snapshot payload format and the remaining perf gap stay deferred.
### Telemetry additions
- `[:continuum, :run, :continued_as_new]`
- `[:continuum, :run, :unknown_version]`
- `[:continuum, :child, :started | :completed | :failed]`
- `[:continuum, :compensation, :scheduled | :started | :completed | :failed]`
- `[:continuum, :patched, :hit]`
### Documentation
- Added guides for sagas, child workflows, long-running workflows, patching,
and workflow versioning.
- Added `MIGRATING_v0_2_to_v0_3.md`.
- Updated `continuum_example_orders` with a refund compensation and a
parent/child batch workflow.
## v0.2.0 — 2026-05-15 — "I can see what's happening"
v0.2 makes the v0.1 engine operable: a free Phoenix LiveView Observer, an
optional OpenTelemetry bridge, opt-in history snapshots, named multi-instance
runtimes, and six pieces of v0.1 debt paid down (event partitioning, ETS timer
cache, idempotency enforcement, helper-module determinism warnings, the
`signal_awaited` fast-path, per-process repo threading).
See [`MIGRATING_v0_1_to_v0_2.md`](./MIGRATING_v0_1_to_v0_2.md) for the upgrade
path.
### New surfaces
- `Continuum.Observer` — optional Phoenix LiveView observer: runs index, run
detail with decoded event timeline, operator actions for cancelling and
sending signals. Mounted via `Continuum.Observer.Router.continuum_observer/2`
with an optional `:layout` forwarded to `live_session/3`. Continuum core
compiles without Phoenix LiveView installed; host applications add the
Phoenix dependencies when they mount the Observer. Self-contained demo
ships at `dev/observer_demo.exs`. See `guides/observer.md`.
- `Continuum.OpenTelemetry.setup/1` — opt-in bridge that turns
`[:continuum, :run, ...]` and `[:continuum, :activity, ...]` telemetry into
short `continuum.run_attempt` and `continuum.activity_attempt` spans. Resume
spans link back to the original trace via the new
`continuum_runs.trace_context` column. Continuum still compiles without any
OpenTelemetry packages. See `guides/observability.md`.
- `Continuum.children/1` — host-supervisor helper for named instances. Each
instance owns its own registry, run supervisor, dispatchers, timer wheel,
signal router, lease heartbeater, snapshotter, and recovery process bound
to a single Ecto repo. Public calls accept `instance: name`. The default
`Continuum` instance is unchanged. See `guides/multi-instance.md`.
- Experimental, opt-in history snapshots: `continuum_snapshots`,
`Continuum.Snapshot`, `Continuum.Runtime.Snapshotter`, compacted-prefix
replay validation, snapshot telemetry, snapshot benchmark harness
(`bench/snapshot_bench.exs`). Replay-loop cost on a 10k-event side-effect
workflow drops ~8× when snapshots are enabled. The v0.2 plan's ≥10×
acceptance target is *not* met; the gap is accepted under E1's
minimum-acceptance clause because snapshots ship experimental and opt-in
(default `snapshot_threshold: :infinity`). Closing the remaining 25% is
tracked for v0.3 once runtime use is dogfooded. See `guides/snapshots.md`.
### v0.1 debt paid down
- Monthly partitioning for `continuum_events` (`PARTITION BY RANGE
(inserted_at)`), with operator Mix tasks: `mix continuum.partitions.create`,
`mix continuum.partitions.list`, `mix continuum.partitions.drop_old`
(`--execute` opt-in). No runtime partition manager in v0.2.
- Activity idempotency is enforced through `continuum_activity_results` keyed
on `(activity_module, idempotency_key)`. Committed results are reused
across runs without re-running the activity body. New telemetry
`[:continuum, :activity, :idempotency_hit]`. See `guides/idempotency.md`.
- ETS-cached `Continuum.Runtime.TimerWheel`: near-term timer cache hydrated
from Postgres, 30s refresh safety net, and `continuum_timer_armed`
`pg_notify` reschedules. Replaces the v0.1 1s polling loop. TimerWheel owns
its own Postgrex notification listener per instance. Benchmark harness
`bench/timer_wheel_bench.exs` reports a 20.0x DB-query reduction for 1000
idle long-due timers over a 60s window (60 pre-cache poller queries vs. 3
cached-wheel timer SELECTs).
- Compile-time warnings for workflow calls into helper modules that are not
stdlib-trusted, not marked `use Continuum.Pure`, and not allowlisted via
`config :continuum, trusted_modules: [...]`. Severity is configurable with
`config :continuum, untrusted_call_severity: :warn | :error` (default
`:warn`). See the *Helper Modules* section of `guides/determinism-rules.md`.
- Postgres signal-await fast-path: when a signal is already in the durable
mailbox, `await signal(...)` journals `signal_received` directly and skips
the `signal_awaited` event plus the timeout-timer write. Old histories
that did journal `signal_awaited` replay unchanged.
- Per-process repo / multi-instance threading. `Continuum.children/1`
registers a named instance; `instance:` selects it on `start/3`,
`signal/4`, `cancel/2`, `await/3`. Lease owner format is now
`node()/instance/monotonic_int`. `Continuum.InstanceNotRegisteredError`
surfaces unknown names. Postgres `start_run` accepts `trace_context:` so
resumed runs can link OTel spans back to the original trace.
### Determinism hardening
- Snapshot compaction fails closed when a source event lacks a `command_id`:
`Snapshot.compact/4` returns `{:error, {:missing_command_id, seq}}` instead
of producing a nil-matching step that any effect would replay through.
- In-memory journal now assigns sequence numbers when callers omit `:seq`,
matching the Postgres `next_seq/1` semantics. Fixes a latent gap where
`inject_signal/4` and `fire_timer/2` could write `seq: nil` events that
snapshot compaction would later misorder or drop.
### Behavior changes operators should know about
- `continuum_events` primary key is now `(run_id, seq, inserted_at)` because
Postgres partitioned tables require the partition key in the PK. Continuum
still guarantees per-run `(run_id, seq)` uniqueness through the run-row
write lock; SQL-level uniqueness was relaxed only to satisfy the partition
shape. Migration notes are in `MIGRATING_v0_1_to_v0_2.md`.
- `signal_awaited` is no longer journaled when a matching signal is already
pending. Dashboards counting `signal_awaited` rows as a proxy for "signal
arrivals" should count `signal_received` instead.
- Helper-module calls inside `use Continuum.Workflow` modules now produce a
compile-time warning unless the module is `use Continuum.Pure`,
stdlib-trusted, or listed in `config :continuum, trusted_modules: [...]`.
### Migrations
Four delta migrations on top of v0.1, runnable in order on a fresh database
or the current local v0.1 dev/test schema:
1. `20260601000000_partition_continuum_events`
2. `20260601000001_create_continuum_activity_results`
3. `20260601000002_create_continuum_snapshots`
4. `20260601000003_add_trace_context_to_runs`
Fresh installs (`mix continuum.gen.migration`) get the v0.2 shape directly
and do not need the delta migrations. v0.1 had no public release, so there
is no production-data compatibility promise.
### Telemetry additions
- `[:continuum, :activity, :idempotency_hit]`
- `[:continuum, :snapshot, :taken]`
- `[:continuum, :snapshot, :skipped]`
All Continuum telemetry events now include `instance: name` metadata so
dashboards can split correctly when more than one instance is active.
### Documentation
- New: `guides/multi-instance.md`, `guides/snapshots.md`,
`guides/observer.md`, `guides/observability.md`, `guides/idempotency.md`.
- Updated: `guides/determinism-rules.md` now covers the helper-module warning,
`use Continuum.Pure`, and `trusted_modules`.
- New: `MIGRATING_v0_1_to_v0_2.md` at the repo root.
### Note on the module-count moat
The ROADMAP's "~25 core modules" target was a v0.1 working principle. v0.2
deliberately revises it: with Observer, OpenTelemetry, snapshots, multi-instance
plumbing, and the Mix-task surface, raw module count is no longer the right
shape of the moat. The replacement target is keeping the **runtime** surface
small and justified — new runtime processes need a written reason — while
allowing optional UI modules (Observer LiveViews, components), Mix tasks, and
schema files to land where they make sense. At tag-prep time the v0.2 tree has
49 `.ex` files under `lib/`, with 19 under `lib/continuum/runtime/`. The v0.2
tree adds the Snapshotter as a runtime child; everything else under
`lib/continuum/observer/` and `lib/mix/tasks/` is optional surface.
### Known limitations carried forward to v0.3+
- Snapshot runtime use is experimental in v0.2. Default
`snapshot_threshold: :infinity` (off). Public snapshot payload format
(`:erlang.term_to_binary` of the struct) is not promised stable.
- `Continuum.VersionRegistry` and `Continuum.patched?/1` remain stubs;
content-addressed module dispatch and journaled patch decisions land in
v0.3.
- `compensate` macro and parent/child workflows are still v0.3.
- `continue_as_new` is v0.3.
- `mix continuum.audit` is v0.5.
- Cluster distribution and the `:peer`-based multi-node test harness are v0.5.
- No Oban adapter yet — v0.5.
- Observer has no replay-stepping debugger in v0.2 (run detail shows the
durable timeline only). Replay debugger is v0.3+.
## v0.1 — "It survives a crash"
The full v0.1 surface from `ROADMAP.md` is implemented, exercised by 97 tests + 2
StreamData properties, and stable across multiple random seeds. ~26 core
modules + 5 schemas + 3 mix tasks.
### Workflow definition & determinism
- `use Continuum.Workflow` — `@on_definition` runs `Continuum.AstCheck` on every
clause; `@before_compile` computes the AST version hash and registers
`__continuum_workflow__/0`.
- `use Continuum.Activity` — retry/timeout policy DSL with
`idempotency_key/1` plumbed through the task struct.
- `use Continuum.Pure` — opt helper modules into the AST-scanned trusted set.
- `Continuum.AstCheck` — compile-time determinism scanner with curated
denylist (including `Continuum.start/3`, `signal/3`, `cancel/2`, `await/3`,
which are side effects when called from inside a workflow) and
remediation hints.
- Workflow DSL: `activity`, `await signal(...)` with optional `timeout: ms`,
`timer`, `seconds/minutes/hours/days`. Each macro computes a structured
`command_id = {kind, module, function, line, hash, ordinal}` at expansion
time.
- Deterministic primitives `Continuum.now/0`, `today/0`, `uuid4/0`,
`random/0` are macros (not functions) so they capture `__CALLER__` and
produce stable cursor identity. `Continuum.side_effect/1` is the runtime
escape hatch.
- `Continuum.ReplayDriftError` raised on type mismatch *or* command-identity
mismatch — drift is detected even when shapes happen to match.
### Runtime
- `Continuum.Runtime.Engine` — GenServer-per-run with `restart: :temporary`.
Crashed engines are not restarted by OTP; resume is the dispatcher's job.
- `Continuum.Runtime.Effect.run/2` — canonical replay-or-suspend bridge,
shared by both journal adapters.
- `Continuum.Runtime.Context` — process-dict cursor + `command_counts` for
ordinal disambiguation.
- `Continuum.Runtime.Dispatcher` — `FOR UPDATE SKIP LOCKED` poller for
runnable runs; captures fresh fencing token at claim time.
- `Continuum.Runtime.Recovery` — boot-time orphan rescue, filters on
`lease_expires_at < now()` so live remote leases are never stolen.
### Journal & leasing
- `Continuum.Runtime.Journal` behaviour with `InMemory` and `Postgres`
adapters. Both share the engine's replay loop.
- Postgres adapter stores opaque payloads as `bytea`
(`:erlang.term_to_binary/1`) and gates every write through
`lock_and_validate_run!` (run lease) and `lock_and_validate_activity_task!`
(task lease). `lock_and_validate_active_run!` rejects writes against
cancelled/completed/failed runs as defense-in-depth.
- `Continuum.Runtime.Lease` + `Lease.Heartbeater` — fencing token via
`nextval('continuum_lease_token_seq')`, owner string `node()/monotonic_int`
(greppable). Heartbeater monitors engine pids and unsubscribes on DOWN.
- Postgres schemas `Continuum.Schema.{Run, Event, Signal, Timer,
ActivityTask}` with `bytea` payload columns.
### Activities
- `Continuum.Runtime.ActivityWorker.{Supervisor, Dispatcher, Worker}` —
claim joins on `continuum_runs` and snapshots `r.lease_token` so the
worker carries its own authority through to atomic completion.
- `Journal.Postgres.complete_activity_task!/3` does event-append +
task-update under run-lease + task-lease CAS in one transaction.
- `Journal.Postgres.retry_activity_task!/4` for the retry path with
exponential backoff.
### Signals & timers
- `Continuum.Runtime.SignalRouter` — Postgres LISTEN consumer; single-strategy
delivery (Postgres vs in-memory chosen at startup based on
`Journal.default()`). For in-memory mode, appends `signal_received`
directly and wakes the engine.
- `Continuum.Runtime.TimerWheel` — Postgres-truth poller. Claim joins runs
and captures `lease_token`. Handles the signal-await-with-timeout race
via `timer_winner/2` (`already_resolved` / `already_fired` branches).
### Cancellation
- `Continuum.cancel/2` → `Journal.Postgres.cancel_run!/2` discards pending
activity tasks, marks pending timers `fired = true`, and fails the run,
all in one lease-CAS-guarded transaction.
### Public API, telemetry, test helpers
- `Continuum.PubSub` wired up: terminal transitions
(`completed`/`failed`/`cancelled`) broadcast `{:run_finished, run_id,
state, payload}`; `Continuum.await/3` subscribes-then-receives with a 5ms
poll fallback.
- `Continuum.Telemetry` — 24+ named events under the `[:continuum, …]`
prefix, fired on every state transition.
- `Continuum.Test` — `start_synchronous/3` (in-memory inline-activity mode),
Postgres helpers, `replay/4` for golden histories, `inject_signal/4`,
`fire_timer/2`, sandbox checkout, `reset_in_memory!/0`.
- `mix continuum.gen.{migration, workflow, activity}`.
### Documentation & examples
- ExDoc reference, three guides, one example app (`continuum_example_orders`).
- `docker-compose.yml` for local Postgres; `mix test` aliases auto-create +
auto-migrate the test repo.
### Verification
- Crash-resume integration test (`activity → timer → activity`,
`Process.exit(engine, :kill)` mid-flight, asserts new pid + full event
sequence + final result).
- Lease-fencing race test (three variants: `append!`, `cancel_run!`,
`complete_activity_task!` all reject stale-token writes).
- StreamData property-based replay test on pure-side_effect and mixed
activity+side_effect histories.
- Postgres-backed replay + drift test, bytea encoding round-trip test,
cancellation/recovery/timer/signal/dispatcher/lease unit tests.
### Known limitations carried to v0.2
- `continuum_events` is unpartitioned (retention story in v0.2).
- `TimerWheel` is a poller, not the ETS-cached due-queue (perf upgrade).
- AST scan over unmarked helper modules produces no warning until
`use Continuum.Pure` is added (polish in v0.2).
- `signal_awaited` is journaled even when a signal is already in the
durable mailbox (cosmetic; two events instead of one).
- `Continuum.Activity`'s `idempotency_key/1` is plumbed but not enforced
by a side-table (real exactly-once-ish semantics in v0.2).
- `config :continuum, :repo` is a global app-env value (per-process repo
threading in v0.2).
- `Continuum.VersionRegistry` and `Continuum.patched?/1` are stubs;
content-addressed module dispatch and journaled patch decisions land in
v0.3.
- `compensate` macro is not in v0.1; users do `try/rescue` + cleanup
activity until v0.3.
- `mix continuum.audit` is not implemented; v0.5.
## v0.1-dev-skeleton
- ROADMAP.md (full architecture, phased v0.1→v1.0 plan, market context)
- CLAUDE.md (orientation for future sessions)
- Continuum.AstCheck — compile-time determinism scanner with curated
denylist and remediation hints
- use Continuum.Workflow / Activity / Pure macros (AST scan via
@on_definition, AST-hash versioning via @before_compile)
- Workflow DSL: activity, await signal(...), timer, compensate,
seconds/minutes/hours/days
- Runtime: Engine (GenServer-per-run), Effect.run/2 with throw-based
suspend/replay, Context, Journal behaviour + InMemory adapter
- Deterministic primitives: now/0, uuid4/0, random/0, today/0,
side_effect/1
- Continuum.ReplayDriftError with structured diff
- Postgres schemas + mix continuum.gen.migration
22 tests passing across 8 random seeds. Zero compiler warnings.