Skip to main content

CHANGELOG.md

# Changelog

All notable changes to `gen_durable` are documented here. The format follows
[Keep a Changelog](https://keepachangelog.com/); this project is pre-1.0 and makes
**no backward-compatibility guarantees** — there is one schema version and migrations are
edited in place until the MVP settles.

## 0.1.8

### Added
- **Rate limiting — token bucket, per step (spec §12).** A step opts into a named limit by
  returning `{:next, step, state, rate_limit: :stripe}` (or `{:stripe, partition}` for a bucket
  per tenant). Configured at start: `rate_limits: [stripe: [allowed: 100, period: {1, :minute}]]`
  (`burst` defaults to `allowed`). Enforced in the picker (one statement) by a per-bucket token
  counter locked with `FOR UPDATE` — correct across nodes, and measured to cost nothing on the
  common path (NULL `rate_limit` short-circuits; the rate CTEs are `never executed`).
- **Weighted steps.** `{:next, step, state, rate_limit: :stripe, weight: 50}` — a step may consume
  more than one budget unit. Grants take the urgency prefix whose cumulative weight fits (strict
  order, free head-of-line reservation). `weight ≤ burst` is the caller's responsibility and is
  **not** validated — a too-fat step freezes its bucket; split the step instead.
- New tables `gen_durable_rate_configs` / `gen_durable_rate_buckets`; new `gen_durable.rate_limit`
  and `gen_durable.weight` columns. `:next` now normalizes to a 4-tuple carrying a per-transition
  opts map (`rate_limit`, `weight`). `insert`, `insert_all`, and `schedule_childs` children all
  carry the columns and ensure their buckets.
- Telemetry: `[:gen_durable, :rate_limit, :throttled]` (a bucket granted fewer than wanted) and
  `[:gen_durable, :rate_limit, :unknown]` (a step named an unconfigured rate-limit).

### Deliberately not added (settled decisions)
- **Weighted-step poison guard** (`weight ≤ burst`): a too-fat step freezes its bucket — the caller's
  responsibility (split the step), consistent with the engine's "you own correctness" stance.
- **Sliding/fixed-window rate algorithms**: token bucket only (its `rate`+`burst` knobs cover the
  spectrum; sliding-log breaks single-row locking).
- **Boot-time validation / `on_unknown` policy**: an unknown rate-limit key stalls the row and emits
  `[:gen_durable, :rate_limit, :unknown]` — that is the chosen v1 behaviour (no fail-fast at boot, no
  `:run`/`:stop`/`:defer` knob).

## 0.1.7

### Changed
- Renamed `partition_key` → `concurrency_key` (column, insert option, SQL, the
  `gen_durable_concurrency_active` index, and the `[:gen_durable, :concurrency, :contended]`
  telemetry event). Pure rename; the SQL `PARTITION BY` window keyword is untouched.

## 0.1.6

### Changed
- Dropped the `:unique` policy enum (`:live`/`:global`). `correlation_scope` (a `durable_status[]`)
  is now passed directly, defaulting to the non-terminal statuses. This also removes the surprising
  built-in `:global` behaviour where a finished instance silently swallowed signals.

## 0.1.5

### Changed
- Renamed the same-step outcome `:replay` → `:retry` (it redoes the step with `attempt += 1` after a
  delay; the old name read like event-sourcing replay).
- Merged addressing and uniqueness into one `correlation_key` (Temporal/DBOS workflow-id model): the
  business key you `signal/4` by is the same key the engine deduplicates on. Replaces the separate
  `external_id` (addressing) and `unique_key`/`unique_scope` (dedup). One partial unique index does
  both jobs.
- Dropped the misleading "on top of GenServer" framing: an FSM is a row, not a process — there is no
  GenServer per instance; the runtime backbone (scheduler/reaper/GC) is a small set of GenServers.

## 0.1.4

### Added
- Built-in GC of terminal (`done`/`failed`) rows: `GenDurable.GC`, configurable `:gc_interval` /
  `:gc_retention` (default 1 day) / `:gc_batch`; `[:gen_durable, :gc, :swept]` telemetry. The delete
  is O(batch) (select ids, then `DELETE … WHERE id = ANY`), not O(table).

## 0.1.3

### Changed
- `await` waits on a **set** of signal names; the woken step sees the matched subset as `ctx.awaited`
  (full inbox in `ctx.all`). Consumption is by received id on progress (latecomers survive; packs can
  accumulate via re-await); a terminal outcome clears the whole inbox.

## 0.1.2

### Added
- Job form: define `perform/1`/`perform/2` instead of `step/2` for a one-shot durable job with
  built-in retry/backoff. Folded into `GenDurable.FSM`.

## 0.1.1

### Added
- Nested `State` embedded-schema adopted by convention (no `state:` option needed).

## 0.1.0

### Added
- Initial durable FSM engine: Postgres-backed, state committed before each step proceeds
  (at-least-once, whole-step re-execution). Steps and outcomes (`:next`/`:retry`/`:await`/`:done`/
  `:stop`), `schedule_childs` fan-out + fan-in barrier (§11), durable signals/await (§5), queues with
  concurrency, priority, scheduling sugar, lease + reaper crash recovery, `concurrency_key`
  serialization, uniqueness, single-round-trip outcomes, feeder backpressure, graceful drain, broad
  telemetry, dynamic FSM resolution, library-owned migration.