README.md

# Lockstep

[![CI](https://github.com/b-erdem/lockstep/actions/workflows/ci.yml/badge.svg)](https://github.com/b-erdem/lockstep/actions/workflows/ci.yml)
[![Hex.pm](https://img.shields.io/hexpm/v/lockstep.svg)](https://hex.pm/packages/lockstep)
[![Hexdocs](https://img.shields.io/badge/hex-docs-lightgreen.svg)](https://hexdocs.pm/lockstep)
[![License](https://img.shields.io/hexpm/l/lockstep.svg)](https://github.com/b-erdem/lockstep/blob/main/LICENSE)

Coyote-style controlled concurrency testing for the BEAM.

You write what looks like a regular ExUnit test. Lockstep runs it many
times, each time picking a different message-passing schedule via a
configurable strategy. When it finds a bug, it prints the schedule that
caused it and saves it to disk so you can reproduce.

## Contents

- [Installation](#installation)
- [Using Lockstep with AI tools](#using-lockstep-with-ai-tools)
- [Need this on your system?](#need-this-on-your-system)
- [Why bother](#why-bother)
- [What's shipping](#whats-shipping)
- [What's not shipping yet](#whats-not-shipping-yet)
- [Quick start](#quick-start)
- [Showcase: bug classes Lockstep finds](#showcase-bug-classes-lockstep-finds)
- [Real libraries verified under Lockstep](#real-libraries-verified-under-lockstep)
- [Configuration](#configuration)
- [Testing methodology](#testing-methodology)
- [API](#api)
- [Reproducing a bug](#reproducing-a-bug)
- [Shrinking a trace](#shrinking-a-trace)
- [Long runs and progress](#long-runs-and-progress)
- [Bug-finding regression suite](#bug-finding-regression-suite)
- [Design](#design)
- [License](#license)

## Installation

Add to your `mix.exs`:

```elixir
def deps do
  [
    {:lockstep, "~> 0.1.0", only: :test}
  ]
end
```

Status: **0.1.0 / initial release.** API is stable for the surface
documented here, but expect breakage as we tighten things and respond
to real-world usage. SemVer applies starting at 1.0.

## Using Lockstep with AI tools

Lockstep ships with first-class AI integration:

- **[`AGENTS.md`](AGENTS.md)** — guide for AI coding agents
  (Claude Code, Cursor, Copilot) to invoke Lockstep via the CLI.
  Documents the input/output contract for `mix test` /
  `mix lockstep.replay` / `mix lockstep.shrink`.
- **[`skill/lockstep/SKILL.md`](skill/lockstep/SKILL.md)** —
  Anthropic Skill format. Loadable by the Anthropic Agent SDK so
  agents understand when and how to use Lockstep.
- **[`mcp_server/`](mcp_server/README.md)** — Model Context Protocol
  server. Wraps `mix` commands as MCP tools (`lockstep_test`,
  `lockstep_replay`, `lockstep_shrink`, `lockstep_dump_trace`).
  Plug into Claude Code via `claude_desktop_config.json`.

The killer agentic workflow:

1. Agent reads code, hypothesizes a race
2. Agent generates a Lockstep test file
3. Agent calls `lockstep_test` (MCP) — Lockstep runs N iterations
4. If a bug is found, agent reads the trace via `lockstep_replay`
5. Agent generates a fix
6. Agent re-runs `lockstep_test` with the same seed to verify

## Need this on your system?

If you maintain a high-stakes BEAM codebase — a payment processor,
a distributed coordinator, a high-throughput pipeline, a custom
GenServer that *must* be linearizable — and you want a thorough
concurrency audit rather than a one-off bug report, I take on this
work directly.

That includes: applying Lockstep to your code, writing custom
test scaffolds and models, running multi-node race-hunting
campaigns, hardening hot paths, and delivering a written report
with reproducible counterexamples and fixes.

→ **[baris@erdem.dev](mailto:baris@erdem.dev)**

## Why bother

Concrete numbers from
[`test/value_prop/lost_update_comparison_test.exs`](test/value_prop/lost_update_comparison_test.exs)
running the same lost-update race two ways: ordinary `GenServer.call`
in a tight loop (200 runs) vs `Lockstep.GenServer.call` under PCT
(5 seeds × up to 200 iterations).

|                                         | Vanilla `GenServer.call` × 200 | Lockstep PCT × 5 seeds |
|-----------------------------------------|--------------------------------|-------------------------|
| Bug detected                            | 198/200 runs (99%)             | 5/5 seeds (100%)        |
| Time to first detection                 | ~3.6 ms wall-clock             | iterations 1, 2, 2, 4, 15 |
| Failing schedule recorded               | No                             | Yes (`.lockstep` file)  |
| Failing schedule prints in human form   | No                             | Yes (alias-mapped trace)|
| Same input → same failing schedule      | No                             | Yes (same seed → same iteration → identical trace) |
| Replay on demand                        | No                             | Yes (`Lockstep.Replay.run/2`) |

Vanilla testing *finds* this bug just fine — running the body in a
tight loop catches it 99% of the time. What it can't do is hand you
a specific failing schedule that you can stare at, save, debug, and
replay. Each failing run is an opaque "the counter ended at 1 instead
of 2" with no record of which interleaving caused it.

Lockstep's value isn't *finding* races — multi-core BEAM finds them
too, eventually. It's giving you a reproducible handle to a *specific*
failing schedule:

- **Same seed → same iteration → byte-identical trace.** Three runs of
  the lost-update test with `seed: 0xC0DE_C0FFEE` all find the bug at
  iteration 2 with structurally identical traces.
- **Trace prints as a human-readable schedule** with aliased pids
  (`P0`, `P1`, ...), step numbers, and a `<-- FAILED HERE` marker.
- **Replay re-fires the bug** through `Lockstep.Replay.run/2` so you
  can attach a debugger / add `IO.inspect` / step through.

Vanilla `assert_raise` in a loop tells you the test is flaky.
Lockstep tells you *which interleaving* makes it flaky.

### Which strategy?

Lockstep ships five strategies (`:random`, `:pct`, `:fair_pct`,
`:pos`, `:idpct`) plus `:replay` for trace-driven reruns. They
behave noticeably differently as race depth grows. From
[`test/value_prop/strategy_comparison_test.exs`](test/value_prop/strategy_comparison_test.exs)
on a 3-subscribe + 1-publish ordering race, 10 seeds × up to 100
iterations each:

| strategy   | bug found | min iter | median | mean | max |
|------------|-----------|----------|--------|------|-----|
| `:random`  | 10/10     | 1        | 3.0    | 3.5  | 7   |
| `:pct`     | 10/10     | 1        | 1.0    | 1.4  | 2   |
| `:fair_pct`| 10/10     | 1        | 1.0    | 1.4  | 2   |
| `:pos`     | 10/10     | 1        | 1.5    | 1.8  | 3   |

Lower iteration counts mean the strategy steered toward the bug
faster. PCT and Fair-PCT find this race in 1-2 iterations every time;
random takes ~3.5 on average. POS sits between them. On *shallower*
races (the depth-2 lost-update demo) all four strategies tie at
iteration 1 — the differentiation only shows up once the race has
real depth.

Default is `:pct` because it gives the most consistent low-iteration
finds across both shallow and deep races. Switch to `:fair_pct` if
your code has spin loops where pure PCT can starve a low-priority
process; switch to `:pos` if PCT plateaus on a particular bug shape.


## What's shipping

### Engine
- `Lockstep.Test` — ExUnit case template with a `ctest/3` macro that runs
  the body N times.
- A user-level scheduler (`Lockstep.Controller`) that intercepts every
  Lockstep sync point and decides who runs next.
- Five strategies: `:random`, `:pct`, `:fair_pct`, `:pos`, `:idpct`.
  - PCT is Burckhardt et al., ASPLOS 2010 — bug-depth probabilistic guarantees.
  - Fair-PCT is Coyote's hybrid: PCT then random, for liveness.
  - POS is Yuan et al., OOPSLA 2025 (Fray) — re-randomizes priorities each step.
  - IDPCT is iterative-deepening PCT — cycles depth in a configured range per iteration.
- Trace-driven replay: `:replay` strategy + `Lockstep.Replay.run/2`.

### Test surface
- **Selective receive** via `Lockstep.recv_first/1` — predicate-based
  pattern matching over the controller-side mailbox.
- **`Lockstep.GenServer`** — `start_link/2` / `call/2` / `cast/2` /
  `stop/2` wrapper for ordinary GenServer-style modules. Returns
  `{:ok, pid}` so OTP-shaped patterns (`{:ok, srv} = ...`) work
  identically after rewriting.
- **`Lockstep.GenStatem`** — minimal `:gen_statem`-shaped wrapper
  (`handle_event_function` mode + `:reply` actions).
- **`Lockstep.Task`** — `async/1` / `await/1` / `await_many/1`.
- **Virtual time** — `Lockstep.now/0`, `Lockstep.send_after/3`,
  `Lockstep.cancel_timer/1`. Time advances only when no managed process
  is ready and a timer is the only way forward.
- **Process monitors** — `Lockstep.monitor/1`, `Lockstep.demonitor/2`,
  with `:DOWN` messages delivered through the controller's mailbox so
  monitor-based crash detection (NimblePool, Phoenix.PubSub, supervisor
  patterns) works under controlled scheduling. Monitored crashes don't
  trigger the child-crash bug verdict — the monitor is treated as
  "I expected this death."
- **Links + `:trap_exit`** — `Lockstep.spawn_link/1`, `Lockstep.link/1`,
  `Lockstep.unlink/1`, `Lockstep.flag/2`. The controller maintains the
  link graph and propagates abnormal exits with cycle detection:
  non-trapping linkers cascade-die, trapping linkers receive
  `{:EXIT, dying_pid, reason}`, normal exits don't propagate. Lets
  Lockstep run real OTP code that depends on link semantics
  (NimblePool's worker monitoring, Task supervision, GenServer trees).
- **Process.alive?** — `Lockstep.alive?/1` consults the controller's
  view; the call is itself a sync point so the classic
  `if Process.alive?(pid), do: GenServer.call(pid, ...)` TOCTOU race
  surfaces (see `test/examples/toctou_alive_test.exs` — Lockstep finds
  it on iteration 3 under PCT).
- **Registry** — `Lockstep.Registry` with `:unique` / `:duplicate`
  keys, monitor-driven automatic cleanup on registering-process death,
  `dispatch/3` broadcast, and the full `:via` callback contract so
  `Lockstep.GenServer.start_link(M, A, name: {:via, Lockstep.Registry, {reg, :foo}})`
  works. Most-common-subset implementation; partitioning, listeners,
  and `match/4` patterns aren't modeled.
- **Supervisor** — `Lockstep.Supervisor`, `:one_for_one` strategy,
  built on `spawn_link` + `trap_exit`. Restart options `:permanent`,
  `:transient`, `:temporary`. Enforces `max_restarts` / `max_seconds`
  intensity using Lockstep's virtual clock so restart-loop limits are
  reproducible across replays.
- **Compile-time linter** (`Lockstep.Linter`) — warns on bare
  `send`/`receive`/`spawn`, plain `GenServer.call`/`Task.async`/
  `Process.send_after`, and ExUnit `assert_receive`/`refute_receive`
  inside `ctest` bodies.
- **Compile-time rewriter** (`Lockstep.Rewriter`) — automatically
  maps vanilla `GenServer.call`, `Task.async`, `send`, `spawn`,
  `Process.send_after`, and `receive` blocks to their `Lockstep.*`
  equivalents inside a test body. Lets you write vanilla-OTP-shaped
  tests and still get controlled scheduling. Enable per-module via
  `use Lockstep.Test, rewrite: true`, or per-test via
  `Lockstep.Test.vanilla_run/2`.
- **Mix compiler** (`Mix.Tasks.Compile.LockstepRewrite`) — applies
  the same rewriter at the project level: every `lib/**/*.ex` is
  rewritten to a build directory before the standard Elixir compiler
  runs. Lets you point Lockstep at unmodified production modules in
  your own application. Opt in via env-specific
  `compilers:` / `elixirc_paths:` settings (see "Project-level
  rewriting" below).
- **Erlang AST rewriter** (`Lockstep.ErlangRewriter`) — same
  vanilla-OTP-routing trick, but on `.erl` source. Handles
  `gen_server:call`, `erlang:spawn`, `Pid ! Msg`, bare `receive`,
  `process_flag(:trap_exit, ...)`, the bare BIF auto-imports
  (`spawn(F)`), and 4-arg `gen_server:start_link({local, Name}, ...)`
  atom registration. Lets us pull in foundational Erlang libraries
  whose `gen_server` machinery is the actual concurrency surface
  (`:pg`, supervisors, `gen_statem`, `:ssl`'s session manager, etc.).
  Demonstrated on OTP's `pg.erl` (~720 LOC of pure Erlang, the
  foundation of Phoenix.PubSub) — compiles through the rewriter and
  runs under Lockstep's controlled scheduling.
- **`Lockstep.Agent`** — drop-in for OTP `Agent`. Without it,
  `Agent.get/3-4` bypasses Lockstep's controller entirely (it's a
  bare `:gen_server.call`); with it, every agent operation is a
  sync point. Necessary for finding races in `Agent`-backed code.
- **`Lockstep.Task` / `Lockstep.Task.Supervisor`** — extended to
  cover `Task.async_stream/3,4` (honouring `:max_concurrency`),
  `Task.async/3` (MFA form), and `Task.Supervisor.{start_link, async,
  start_child, async_stream}`.
- **NIF-backed shared state** (`Lockstep.ETS`, `Lockstep.Atomics`,
  `Lockstep.PersistentTerm`) — ETS / `:atomics` / `:persistent_term`
  ops are individually atomic but compose racefully (the classic
  `lookup; insert(v+1)` lost-update). Each wrapper inserts a sync
  point before delegating to the underlying NIF, so the strategy
  can interleave between calls. The rewriter routes `:ets.*`,
  `:atomics.*`, `:persistent_term.*` calls through these wrappers.
  Outside a Lockstep test (no controller in scope) the wrappers
  fall through to the underlying NIF — same code runs identically
  in production.

### Tooling
- Trace capture as `:erlang.term_to_binary` to `.lockstep` files.
- `mix lockstep.replay --trace path` — print the saved schedule + repro recipe.
- `mix lockstep.replay --trace path --run "Mod.fn" --file path.exs` —
  re-run a specific test body under the recorded schedule.
- `mix lockstep.dump_trace --trace path` — raw dump for tooling.
- Deterministic seeds: same top-level seed reproduces the same iterations.

## What's not shipping yet

- **No DPOR / exhaustive search.** Lockstep is a randomized tester, not
  a verifier. Concuerror integration is on the roadmap.
- **No distributed Erlang.** Single-node only.
- **No `async: true` ExUnit modules.** `use Lockstep.Test` forces
  `async: false`.
- **No live iteration progress.** A long-running `iterations: 10_000`
  run is silent until it either finishes or finds a bug.
- **`Lockstep.GenStatem`** is `handle_event_function`-mode only;
  `:state_functions` mode and built-in timeout actions aren't wrapped.

## Quick start

```elixir
# mix.exs
{:lockstep, "~> 0.1.0", only: :test}
```

Write a controlled test:

```elixir
defmodule MyApp.RaceTest do
  use Lockstep.Test, iterations: 200, strategy: :pct

  ctest "lost update on shared counter" do
    parent = self()
    state = Lockstep.spawn(fn -> stateful_loop(0) end)

    for _ <- 1..2 do
      Lockstep.spawn(fn ->
        Lockstep.send(state, {:get, self()})
        v = Lockstep.recv()
        Lockstep.send(state, {:put, v + 1})
        Lockstep.send(parent, :done)
      end)
    end

    for _ <- 1..2, do: Lockstep.recv()

    Lockstep.send(state, {:get, self()})
    assert Lockstep.recv() == 2
  end

  defp stateful_loop(n) do
    case Lockstep.recv() do
      {:get, from} -> Lockstep.send(from, n); stateful_loop(n)
      {:put, x}    -> stateful_loop(x)
    end
  end
end
```

Run it:

```bash
mix test test/my_app/race_test.exs
```

Expected output on a buggy schedule:

```
Lockstep found a concurrency bug on iteration 2.

  seed:        828370911214
  iter seed:   1417509209
  strategy:    :pct
  trace path:  traces/counter_race-iter2-seed828370911214.lockstep

Reason:
  assertion failed: lost update detected; counter is 1

Processes:
  P0(root) = #PID<0.186.0>
  P1 = #PID<0.187.0>
  ...

Schedule:
  step    6  send    P3 -> P1  {:get, ...}
  step    8  send    P2 -> P1  {:get, ...}
  step   11  recv    P3  0
  step   18  recv    P2  0    # both workers read the same value!
  ...

Replay with:
    mix lockstep.replay --trace traces/counter_race-iter2-seed828370911214.lockstep
```

## Showcase: bug classes Lockstep finds

The `test/examples/` directory contains worked demos. Each is a
side-by-side of a buggy implementation that Lockstep flags within tens
of iterations and a fixed implementation that it explores at length
without finding a problem.

### Atomicity violation: lost-uniqueness in a job queue

[`test/examples/unique_jobs_test.exs`](test/examples/unique_jobs_test.exs).
A textbook bug from the Elixir job-queue ecosystem: naive `submit_unique`
does a `lookup` then an `insert`, which under concurrency lets two
callers both observe the key as missing and both insert. DB-backed
queues have hit this exact race historically; the fix is always
atomic upsert / `INSERT ... ON CONFLICT`.

```elixir
# Buggy: split read+write — Lockstep finds duplicate submissions on iter ~15.
case Lockstep.GenServer.call(db, {:lookup, key}) do
  nil -> Lockstep.GenServer.call(db, {:insert, key, val}); :submitted
  _   -> :already_exists
end

# Fixed: single atomic call serialized by the GenServer — 100/100 iters pass.
Lockstep.GenServer.call(db, {:put_new, key, val})
```

The trace pinpoints the failure: both workers call `:lookup` (steps 6
and 8), both receive `nil` (steps 11 and 15), both call `:insert`.

### Atomicity violation: read-modify-write on shared state

[`test/examples/genserver_race_test.exs`](test/examples/genserver_race_test.exs).
Two workers each do `Counter.get` then `Counter.put(val + 1)`. With
N=2 workers expecting the counter to reach 2, some interleavings let
both reads happen first, both writes set the counter to 1, and one
update is lost. PCT finds it on iteration 4.

The control test in the same file uses `cast({:add, 1})` instead — the
GenServer serializes the read-modify-write — and Lockstep proves it
race-free across all explored schedules.

### Deadlock: circular GenServer calls

[`test/examples/circular_call_deadlock_test.exs`](test/examples/circular_call_deadlock_test.exs).
Two `Echo` GenServers wired up as peers. Each one's `handle_call(:echo,
...)` synchronously delegates to its peer. Two concurrent callers — one
to A, one to B — and both servers end up mid-`handle_call` waiting for
the other to reply. Classic A-waits-for-B-waits-for-A.

Lockstep's `check_progress` step notices that every alive process is
blocked on `recv_match` with no pending timer that could unblock anyone,
and aborts the iteration with `:lockstep_deadlock`. The trace shows the
exact sequence of `gen_call`s that closed the cycle.

This bug class is hard to catch with property-based testing because it
needs a specific interleaving (both callers in flight before either
peer-call returns) — Lockstep's PCT strategy finds it on iteration 1.

### Auto-rewrite: vanilla OTP code, controlled scheduling

[`test/examples/auto_rewrite_demo_test.exs`](test/examples/auto_rewrite_demo_test.exs).
Same lost-update race, written entirely with vanilla `GenServer.call`,
`Task.async`, `Task.await_many` — no `Lockstep.*` calls in the test
body. The `Lockstep.Test.vanilla_run/2` macro walks the body's AST
and rewrites every recognized OTP call to its Lockstep equivalent
before handing it to the runner.

```elixir
test "auto-rewrite finds the race in vanilla OTP code" do
  assert_raise Lockstep.BugFound, fn ->
    Lockstep.Test.vanilla_run iterations: 100, strategy: :pct, seed: 1 do
      {:ok, c} = GenServer.start_link(Counter, 0)

      tasks =
        for _ <- 1..2 do
          Task.async(fn ->
            v = GenServer.call(c, :get)
            :ok = GenServer.call(c, {:set, v + 1})
          end)
        end

      Task.await_many(tasks)

      final = GenServer.call(c, :get)
      if final != 2, do: raise "lost update; counter is #{final}"
    end
  end
end
```

Lockstep finds the race on iteration 2 with the same trace it would
have produced for the manually-written equivalent. Useful when you
want test bodies to read like ordinary Elixir, when you're porting
an existing flaky test to Lockstep, or when you want to demonstrate a
race in a vanilla snippet without forcing readers to learn the
`Lockstep.send`/`Lockstep.recv` API up-front.

The rewriter handles `GenServer.{call, cast, start_link, stop}`,
`Task.{async, await, await_many}`, `send`, `Kernel.send`,
`:erlang.send`, `spawn`, `spawn_link`, `Process.send_after`,
`Process.cancel_timer`, and `receive do clauses end` (translates to
`Lockstep.recv_first` + `case`). Helper functions defined outside the
test body are *not* rewritten by `vanilla_run`/`ctest rewrite: true`
— inline race code into the body, or use the project-level Mix
compiler below to rewrite an entire `lib/`.

### Project-level rewriting (Mix compiler)

To test unmodified modules in your `lib/` (or in a dep) without
changing source, plug Lockstep into your test compile pipeline:

```elixir
# mix.exs
def project, do: [
  ...
  compilers: compilers(Mix.env()),
  elixirc_paths: elixirc_paths(Mix.env()),
  lockstep_rewrite: lockstep_rewrite(Mix.env()),
]

defp compilers(:test), do: [:lockstep_rewrite] ++ Mix.compilers()
defp compilers(_),     do: Mix.compilers()

defp elixirc_paths(:test), do: ["_build/test/lockstep_rewritten/lib", "test/support"]
defp elixirc_paths(_),     do: ["lib"]

defp lockstep_rewrite(:test), do: %{paths: ["lib/**/*.ex"]}
defp lockstep_rewrite(_),     do: nil
```

In `:test`, every matching `.ex` is rewritten under
`_build/test/lockstep_rewritten/lib/` and the standard Elixir
compiler picks it up from there. Production / dev compilation is
untouched — no `Lockstep.*` runtime cost.

The same machinery works for dep code: include `"deps/<dep>/lib/**/*.ex"`
in the `paths:` list. The rewriter handles `GenServer.{call, cast,
start_link, stop, reply}`, `Task.{async, await, await_many}`,
`send`, `spawn`, `spawn_link`, `Process.{send_after, cancel_timer,
monitor, demonitor}`, and `receive` blocks. `Process.flag(:trap_exit, …)`
and `:erlang.spawn/3` (MFA form) are left alone.

End-to-end NimblePool dep-rewrite demo:
[`test/phase2_dep_rewrite_test.exs`](test/phase2_dep_rewrite_test.exs).

### CubDB historical race (commit 60fb38f)

[`test/examples/cubdb_compaction_race_test.exs`](test/examples/cubdb_compaction_race_test.exs).
Models the race upstream CubDB fixed in
[commit 60fb38f](https://github.com/lucaong/cubdb/commit/60fb38f)
("Fix race condition during compaction"). Quoting the maintainer:

> The compaction process actually happens in two phases: actual
> compaction, and catch-up. The previous implementation was starting
> the catch-up phase asynchronously, introducing a race condition:
> right after actual compaction completed, and before catch-up
> started, there was the possibility to erroneously start another
> concurrent compaction.

The shape: a multi-phase operation guarded by a `busy` flag. Phase 1
completion clears `busy` AND queues phase 2 as a separate message.
Between those two events a new operation passes the `busy?` check and
starts a *concurrent* compaction. Phase 2 then sees state overwritten
by the second compaction.

Detection in our model: each compaction has a unique ID, written to
`state.current_id` on `start_compaction`. Phase 2 verifies the ID it
holds matches `state.current_id`; if a second compaction has run in
between and overwritten the field, the assertion fires.

Sweep result over 4 strategies × 5 seeds × 200 iterations each:

| strategy   | finds the race | typical iter |
|------------|-----------------|--------------|
| `:pos`     | 5/5 seeds       | 1, 1, 3, 5, 6 |
| `:random`  | 5/5 seeds       | 1, 3, 3, 4, 9 |
| `:fair_pct`| 1/5 seeds       | 53            |
| `:pct`     | 0/5 seeds       | n/a           |

PCT's bug-depth heuristic happens to bias *away* from this particular
race shape — the change-point insertion at d-1 random steps doesn't
align with the phase-1/phase-2 mailbox window. POS's per-step
priority resampling finds it consistently. **A real consequence of
having multiple strategies bundled.**

The fix in the model (and upstream): keep `busy=true` until BOTH
phases complete. The intermediate mailbox state — phase 2 queued but
not run — no longer admits a new `:start_compaction`. Verified
race-free across the same 200 iterations × `:pct`.

### Honeydew historical race (commit 41831c8)

[`test/examples/honeydew_status_race_test.exs`](test/examples/honeydew_status_race_test.exs).
Models the race upstream Honeydew fixed in
[commit 41831c8](https://github.com/koudelka/honeydew/commit/41831c8):

> Fixing race condition with Honeydew.status/1 where monitors could
> die before being queried for their current job.

The original buggy pattern:

```elixir
busy_workers =
  queue
  |> get_all_members(:monitors)
  |> Enum.map(&GenServer.call(&1, :current_job))
```

Race: between `get_all_members/2` returning a list of monitor pids
and `GenServer.call/2` reaching each one, a monitor can exit. The
call raises `:exit`, crashing the entire `status/1` invocation.

We can't compile and run upstream Honeydew under Lockstep — it's a
full OTP application with `:pg`, supervisors, and Mnesia plumbing
beyond our rewriter's reach — but the race shape is universal.
The minimal reproduction (3 monitors, one killed concurrently with
the iteration) **fails on iteration 2** under PCT. The fixed pattern
(`try/catch :exit, _` around each call) passes 200 iterations.

The point isn't that we found this bug — the maintainer fixed it in
2017. The point is that **the same pattern in any monitor-based
library would surface under Lockstep**, and the failing schedule is
a one-line change in user code (the missing `try/catch`) away from a
clean test.

### Real bug found: Phoenix.Tracker's pre-fc5686f ETS race

[`test/phoenix_tracker_prefix_race_test.exs`](test/phoenix_tracker_prefix_race_test.exs).
Lockstep finds the race fixed upstream in
[fc5686f](https://github.com/phoenixframework/phoenix_pubsub/commit/fc5686f1879bdad0856970cd292844202d91433a)
("Update ETS atomically on tracker update").

Pre-fix `Tracker.update` did `State.leave` then `put_presence` —
two ETS writes back-to-back. A concurrent `Tracker.list` could
observe the table in between, with the entry temporarily missing.
This is exactly the kind of race that's invisible at the
message-passing level — it lives at the ETS NIF level.

With `Lockstep.ETS` providing sync points at every ETS call,
the strategy can interleave `Tracker.list` between the leave and
the put. **POS finds it on iteration 1.** The whole-of-Tracker
plus Phoenix.PubSub plus rewritten OTP `:pg` (~3,700 LOC) runs
unmodified, with sync points injected at NIF boundaries by our
rewriter.

### Real bug found in NimblePool's history

Lockstep finds a real race in NimblePool's
`checkout!` timeout handling, fixed upstream in commit
[e18f45f](https://github.com/dashbitco/nimble_pool/commit/e18f45f79c7eac1766e11f1c7cf9d361903f7f7f)
("Cancel requests on client timeout").

[`test/nimble_pool_timeout_race_test.exs`](test/nimble_pool_timeout_race_test.exs)
takes the pre-fix NimblePool source (just one commit before the
fix), compiles it through `Lockstep.MixCompiler`, and runs a
realistic scenario:

1. Capacity-1 pool
2. Holder client checks out and refuses to release
3. Waiter client calls `checkout!(pool, ..., 50ms)` and uses
   `try/catch :exit, _` to handle the deadline (real production
   pattern)
4. Waiter's timeout fires
5. Holder is then told to release

Under wall-clock testing this race rarely surfaces — it depends on
the scheduler interleaving the timeout and the holder's release in
a particular way. **Lockstep's PCT strategy finds it on iteration 1**,
producing the exit:

```
child crashed: exit(:unexpected_remove)
```

The pool process itself dies. In production, anyone calling
`checkout!` with a `try/catch :exit, _` on this version would
intermittently bring down their entire pool.

The upstream fix has two parts: send a cancel message from the
client before exiting, and turn the `:unexpected_remove` crash into
a no-op for late-arriving removals. With the fix applied (current
NimblePool releases) Lockstep's same scenario runs clean.

### Real-world race patterns

Beyond the libraries the rewriter handles directly, the
real-world-patterns suite reproduces *bug shapes seen in production
BEAM systems* in standalone form, exercising Lockstep's Registry /
Supervisor / link / monitor support.

[`test/examples/real_world_patterns/cache_thundering_herd_test.exs`](test/examples/real_world_patterns/cache_thundering_herd_test.exs).
The classic "lookup → miss → populate → put" anti-pattern. N
concurrent clients each see a miss, each populate, end up
overwriting each other. Lockstep finds it on **iteration 1** under
PCT. The locked-populate fix runs 100 iterations clean.

[`test/examples/real_world_patterns/pubsub_fanout_ack_test.exs`](test/examples/real_world_patterns/pubsub_fanout_ack_test.exs).
PubSub broadcast that waits for acks from every subscriber. If one
subscriber dies between `Registry.lookup` and acking, the broadcaster
blocks forever. Found on **iteration 1** under PCT. The fix is
monitor-based ack accounting — treat `:DOWN` as a counted ack — and
runs clean across 50 iterations.

These patterns are what `Lockstep.Registry`, `Lockstep.Supervisor`,
`Lockstep.spawn_link`, `Lockstep.flag(:trap_exit, true)`, and
`Lockstep.alive?` were *built for*. Each lives in the bug zoo
regression suite so future engine changes can't quietly break their
findability.

## Real libraries verified under Lockstep

Beyond the bug-class showcase above, Lockstep has been pointed at
production BEAM libraries — both pre-fix versions to confirm it
catches known races, and current versions to gather verification
evidence ("no race in N controlled schedules").

| Library | Mode | Result | Test |
|---|---|---|---|
| **Hammer** | rewrite + run | **Real bug** — `Atomic.FixWindow.inc/4` lost-update vs `hit/5`, found under PCT | [`test/hammer_test.exs`](test/hammer_test.exs) |
| **Phoenix.Tracker** (pre-fc5686f) | rewrite + run | **Real bug** — ETS race fixed in fc5686f; POS finds it iter 1 | [`test/phoenix_tracker_prefix_race_test.exs`](test/phoenix_tracker_prefix_race_test.exs) |
| **NimblePool** (pre-e18f45f) | rewrite + run | **Real bug** — checkout timeout race; PCT finds it iter 1 | [`test/nimble_pool_timeout_race_test.exs`](test/nimble_pool_timeout_race_test.exs) |
| Phoenix.PubSub (current) | rewrite + run | Verified clean — 70 PCT iterations across 3 race-hunt scenarios | [`test/phoenix_pubsub_race_hunt_test.exs`](test/phoenix_pubsub_race_hunt_test.exs) |
| GenStage (current) | rewrite + run | Verified clean — single + fan-out pipeline scenarios | [`test/gen_stage_test.exs`](test/gen_stage_test.exs) |
| OTP `:pg` (Erlang) | Erlang rewriter + run | Verified clean — joins/broadcasts under interleaved scheduling | [`test/erlang_pg_test.exs`](test/erlang_pg_test.exs) |
| Mutex (lud/mutex) | rewrite + run | Verified clean — 10,000 schedules across 5 scenarios | [`test/real_libraries_test.exs`](test/real_libraries_test.exs) |
| ConCache.Lock | rewrite + run | Verified clean — 4,000 schedules across 5 scenarios | [`test/concache_lock_hunt_test.exs`](test/concache_lock_hunt_test.exs) |
| NimblePool port | reimplemented | Verified clean — 1,200 schedules across 6 scenarios | [`test/examples/nimble_pool_demo_test.exs`](test/examples/nimble_pool_demo_test.exs) |
| Cachex | test-only Hex dep | Linearizable — 200 iter mixed ops + 500 iter R-M-W | [`test/cachex_test.exs`](test/cachex_test.exs) |

The two work modes:

- **`rewrite + run`** — `Lockstep.MixCompiler` rewrites the
  upstream source, recompiles it, and the test drives the actual
  upstream API under controlled scheduling. Sync points sit at
  every `GenServer.call`, `:ets.*`, `Process.monitor`, `send`,
  `receive`, etc.
- **`Erlang rewriter + run`** — same pattern but for `.erl`
  source: `Lockstep.ErlangRewriter` walks the Erlang abstract
  format and routes `gen_server:call`, `erlang:spawn`,
  `Pid ! Msg`, bare `receive`, etc. through the controller. Used
  for OTP modules like `:pg`, `:gen_statem`, supervisor
  internals.
- **`test-only Hex dep`** — the library compiles unmodified; the
  test wraps its API in `Lockstep.GenServer.call`-shape sync
  points at the boundary. Useful when the lib is too large or
  macro-heavy to rewrite cleanly.

For the methodology behind picking a mode, writing the model, and
sizing iterations, see [METHODOLOGY.md](METHODOLOGY.md).

## Configuration

`use Lockstep.Test` accepts:

| Option          | Default | Meaning                                       |
|-----------------|---------|-----------------------------------------------|
| `:iterations`   | `100`   | How many randomized schedules to try.         |
| `:strategy`     | `:pct`  | `:random | :pct | :fair_pct | :pos | :idpct`. |
| `:seed`         | random  | Top-level RNG seed; per-iteration seeds derive from it. |
| `:max_steps`    | `1000`  | Hard limit on scheduling steps per iteration. |
| `:iter_timeout` | `30_000` | Hard wall-clock timeout per iteration (ms).  |

## Testing methodology

For applying Lockstep to a real BEAM library — the model-based
linearizability template, the compile-rewrite vs test-only-dep paths,
strategy selection, fault injection, and case studies (Cachex,
ConCache, lud/mutex, nimble_pool, Hammer) — see
[METHODOLOGY.md](METHODOLOGY.md). Includes a real bug we found in
Hammer's `Hammer.Atomic.FixWindow.inc/4` (lost-update race against
`hit/5`).

## API

The runtime API for use inside `ctest` bodies:

| Call                         | Description                                                                       |
|------------------------------|-----------------------------------------------------------------------------------|
| `Lockstep.spawn/1`           | Spawn a managed child process.                                                    |
| `Lockstep.spawn_link/1`      | Spawn a managed child and bidirectionally link to it (atomic).                    |
| `Lockstep.send/2`            | Send to another managed process (recorded by controller).                         |
| `Lockstep.recv/0`            | Receive the next message in delivery order.                                       |
| `Lockstep.recv_first/1`      | Selective receive: take first message matching a predicate.                       |
| `Lockstep.now/0`             | Read the controller's virtual clock.                                              |
| `Lockstep.send_after/3`      | Schedule a message after `delay_ms` of virtual time. Returns a timer ref.         |
| `Lockstep.cancel_timer/1`    | Cancel a pending timer; returns ms remaining or `false`.                          |
| `Lockstep.sleep/1`           | Virtual-time sleep (sentinel timer + selective receive under the hood).           |
| `Lockstep.monitor/1`         | Monitor a managed process; `:DOWN` is delivered through Lockstep's mailbox.       |
| `Lockstep.demonitor/2`       | Stop monitoring; `:flush` removes any already-delivered `:DOWN`.                  |
| `Lockstep.alive?/1`          | Consult the controller's view; falls back to vanilla for unmanaged pids.          |
| `Lockstep.link/1`            | Bidirectionally link to another managed process.                                  |
| `Lockstep.unlink/1`          | Remove the link.                                                                  |
| `Lockstep.flag/2`            | `:trap_exit` is modeled at the controller level; other flags are placeholders.    |

Higher-level wrappers built on these primitives:

- `Lockstep.GenServer.{start_link, call, cast, stop}` — test modules
  that follow the GenServer callback contract. `start_link/3` accepts
  `name: {:via, Module, term}`.
- `Lockstep.Task.{async, await, await_many}` — `Task.async`-style
  fan-out / fan-in with selective-receive on a unique ref.
- `Lockstep.Registry.{start_link, register, unregister, lookup, dispatch, count, keys}` —
  controller-aware key/pid registry with `:via` integration.
- `Lockstep.Supervisor.{start_link, which_children, count_children, start_child, terminate_child, restart_child}` —
  `:one_for_one` supervisor with permanent/transient/temporary restart
  semantics and intensity limits.

Every primitive is a sync point — every call hands control back to the
controller, which picks the next process to run per the configured
strategy. The send / spawn variants don't block the user-side caller
for delivery semantics; the controller records / queues the action and
resumes the caller at its next scheduling turn. `recv/0` and
`recv_first/1` block until the controller delivers a message.

`Lockstep.alive?/1` being a sync point is the point: TOCTOU bugs
(`if Process.alive?(pid), do: GenServer.call(pid, ...)`) surface
naturally because the strategy may interleave another process between
the check and the subsequent call.

`Lockstep.link/1` and friends model the BEAM link graph at the
controller level. When a linked process dies abnormally, the cascade
is propagated through Lockstep's link graph (with cycle detection):
non-trapping linkers die transitively, and the chain terminates either
at a `flag(:trap_exit, true)` process — which receives a synthetic
`{:EXIT, dying_pid, reason}` message — or at the test root, which
finalizes the iteration as a bug.

### Selective receive

```elixir
ref = make_ref()
Lockstep.send(server, {:request, self(), ref})

reply =
  Lockstep.recv_first(fn
    {^ref, payload} -> {:ok, payload}
    _              -> false   # any non-truthy means "skip"
  end)
```

Other messages already in the mailbox are not disturbed.

### GenServer wrapper

```elixir
defmodule Counter do
  def init(initial), do: {:ok, initial}
  def handle_call(:get, _from, n),         do: {:reply, n, n}
  def handle_call({:put, x}, _from, _n),   do: {:reply, :ok, x}
  def handle_cast({:add, x}, n),           do: {:noreply, n + x}
end

ctest "lost update on counter" do
  srv = Lockstep.GenServer.start_link(Counter, 0)

  for _ <- 1..2 do
    Lockstep.spawn(fn ->
      v = Lockstep.GenServer.call(srv, :get)
      :ok = Lockstep.GenServer.call(srv, {:put, v + 1})
    end)
  end

  # ... barrier ...
  assert Lockstep.GenServer.call(srv, :get) == 2
end
```

The wrapper itself is a thin loop on top of `Lockstep.recv`/`Lockstep.send`/
`Lockstep.recv_first`. See `lib/lockstep/gen_server.ex` for the
~80-line implementation.

### Virtual time

```elixir
ctest "GenServer with periodic tick" do
  pid = Lockstep.GenServer.start_link(MyTimerServer, [])
  Lockstep.send_after(pid, :start, 0)

  # Drain the test forward until 350ms of virtual time has passed.
  Lockstep.send_after(self(), :wakeup, 351)
  _ = Lockstep.recv()

  assert Lockstep.GenServer.call(pid, :tick_count) >= 3
end
```

Virtual time only advances when the controller would otherwise
deadlock — when every managed process is blocked on receive and the
only way to unblock someone is to fire the next pending timer. So
`Lockstep.send_after(pid, :ding, 86_400_000)` followed by `recv` returns
in milliseconds of wall-clock time, not a day. Same trick `:meck`/Mox
let you do for `:erlang.system_time`, but built into the scheduler.

## Reproducing a bug

Three ways, in order of laziness:

**1. Re-run the suite with the original seed.** The trace records both
the top-level seed and the iteration. Re-run with `seed: <top_seed>` and
`iterations: >= <iter>`. Per-iteration seed derivation is
`phash2({top_seed, iteration})`, so iteration K of the same top seed
always behaves identically.

**2. Replay from the CLI.** Extract the test body into a named function,
then:

```bash
mix lockstep.replay --trace traces/foo.lockstep \
                    --run "MyApp.RaceTest.lost_update_body" \
                    --file test/my_app/race_test.exs
```

The mix task loads the test file (so the module becomes available),
then runs `Lockstep.Replay.run/2` with the schedule from the trace.

**3. Replay programmatically.** Same idea, written as an ordinary
ExUnit test:

```elixir
defmodule MyApp.RaceTest do
  use Lockstep.Test, ...

  # The body is extracted into a function so it can be called from
  # both the ctest and a one-off replay test.
  def lost_update_body do
    # ... the actual test scenario ...
  end

  ctest "lost update" do
    lost_update_body()
  end
end

defmodule MyApp.ReplayTest do
  use ExUnit.Case

  test "replay the lost-update bug" do
    Lockstep.Replay.run(
      &MyApp.RaceTest.lost_update_body/0,
      "traces/MyApp-iter2-seed828370911214.lockstep"
    )
  end
end
```

If the user code is fully deterministic (no raw `:rand`, no real I/O,
no NIFs that yield asymmetrically), the replay produces a structurally
identical trace and re-fires the bug. If something diverges, you get a
`Lockstep.ReplayDivergence` with a step number and a hint.

You can also compare two traces structurally:

```elixir
Lockstep.Replay.compare_traces!(original_trace, replayed_trace)
```

Pids and refs are sanitized before comparison since they're recreated
on each run.

## Shrinking a trace

A failing trace can be hundreds of events. Most of those are *necessary
setup* but obscure the actual race. `Lockstep.Shrink.shrink/3` searches
for a shorter `replay_pid_order` that still triggers the same bug
*signature* under strict replay, then saves a `.shrunk.lockstep` next
to the original.

```bash
mix lockstep.shrink --trace traces/counter_race-iter5-seed42.lockstep \
                    --run "MyApp.CounterRaceTest.lost_update_body" \
                    --file test/my_app/counter_race_test.exs
```

Output:

```
Lockstep shrink succeeded:
  original schedule decisions: 87
  shrunk schedule decisions:   12
  attempts:                    156
  bug signature:               {:exception, RuntimeError}
  shrunk trace:                traces/counter_race-iter5-seed42.shrunk.lockstep
```

The shrunk trace replays exactly like the original — point
`mix lockstep.replay` at it for the same bug at fewer scheduling
decisions. Programmatic version:

```elixir
{:ok, info} =
  Lockstep.Shrink.shrink(
    fn -> MyApp.CounterRaceTest.lost_update_body() end,
    "traces/counter_race-iter5-seed42.lockstep",
    verbose: true
  )

info.original_length  # 87
info.new_length       # 12
info.new_trace_path   # "traces/...shrunk.lockstep"
```

The algorithm is two-pass — truncation (shorter prefixes) then
decimation (drop individual decisions) — repeated to a fixed point.
Bug-signature comparison is coarse (same exception type, same
deadlock category, etc.), so the shrunk trace is a *cleaner repro*
not a byte-identical reproduction.

## Long runs and progress

`iterations: 10_000` is silent by default until it either finishes
or finds a bug. Pass `progress:` to opt in:

```elixir
Lockstep.Runner.run(
  fn -> MyTest.race_body() end,
  iterations: 10_000,
  strategy: :pct,
  progress: true                 # prints every ~5%
  # progress: 100                # or: every 100 iterations
  # progress: fn iter -> ... end # or: a custom hook (telemetry, ...)
)
```

## Bug-finding regression suite

`test/bug_zoo_test.exs` is a curated set of known concurrency races,
each declared with a `(strategy, seed, max_iterations, expected_at_most)`
configuration. The test fails if any race takes more iterations than
its baseline — catching engine regressions in bug-finding power as
the controller and strategies evolve.

When making engine changes, run `mix test test/bug_zoo_test.exs` to
confirm bug-finding hasn't slipped. Add new entries as you wire up new
classes of bug.

## Design

The shape mirrors Microsoft Coyote, mapped to BEAM:

| Coyote (.NET)                     | Lockstep (BEAM)                |
|-----------------------------------|--------------------------------|
| `coyote rewrite` (IL rewriter)    | Runtime `Lockstep.send/recv/spawn` API |
| `TestingEngine`                   | `Lockstep.Runner.run/2`        |
| `IActorRuntime`                   | `Lockstep.Controller` GenServer |
| Strategies (random, PCT, ...)     | `Lockstep.Strategy` behaviour  |
| `coyote replay <trace>`           | `mix lockstep.replay`          |

## License

Apache-2.0.