METHODOLOGY.md

Select File:
# Lockstep testing methodology — the playbook

How to apply Lockstep to a real BEAM library to find (or rule out)
concurrency bugs. This is the field guide we wrote while testing
Cachex, ConCache, lud/mutex, nimble_pool, and Hammer.

The bug-finding rate so far: **1 in 5 libraries**. Hammer's
`Hammer.Atomic.FixWindow.inc/4` had a real lost-update race against
`hit/5`. The other four are concurrency-correct under their public
APIs — which is itself useful information.

This doc is the recipe for getting to that signal.

## Contents

1. [When to reach for Lockstep](#when-to-reach-for-lockstep)
2. [Two paths: compile-rewrite vs test-only dep](#two-paths-compile-rewrite-vs-test-only-dep)
3. [The model-based linearizability template](#the-model-based-linearizability-template)
4. [Writing the model](#writing-the-model)
5. [Picking a strategy](#picking-a-strategy)
6. [Iteration counts and tuning](#iteration-counts-and-tuning)
7. [Beyond linearizability: fault injection, time-skew](#beyond-linearizability-fault-injection-time-skew)
8. [What Lockstep can and cannot reach](#what-lockstep-can-and-cannot-reach)
9. [Anti-patterns](#anti-patterns)
10. [Case studies](#case-studies)

## When to reach for Lockstep

Lockstep makes sense when you're testing code where:

- Multiple processes interact through shared mutable state (ETS,
  atomics, persistent_term, registered names).
- You need a *reproducible* failing schedule — vanilla concurrent
  ExUnit tests find races but throw away the trace.
- You suspect a race exists at non-trivial depth (3+ scheduling
  decisions away from a clear "first send wins").

It's overkill when:

- The code is sequential or has only one writer.
- You just want stress testing — `Task.async_stream` in a loop is
  cheaper.
- The race is already easy to catch (tight `:erlang.spawn` loops
  catch lost-updates 99% of the time, see the README's "Why bother").

## Two paths: compile-rewrite vs test-only dep

There are two ways to point Lockstep at a library, with very
different tradeoffs.

### Path A: test-only Hex dep (workers controlled, library raw)

Add the library to `mix.exs` as `only: :test`. Workers run as
`Lockstep.Task.async/1` (controlled), call into the library, and
record into `Lockstep.History`. The library itself runs un-rewritten
on the real BEAM scheduler.

You get:

- Real-world concurrent behavior of the library (real scheduler).
- Lockstep manages worker scheduling and history recording.
- Linearizability checking against your model.

You don't get:

- PCT scheduling *into* the library's GenServer mailbox dispatch.
  Internal races aren't actively explored — you're hoping the BEAM
  scheduler hits them naturally.

Use this when the library is too heavy to compile-rewrite (heavy
macros, many transitive deps, kernel-level features). **Cachex** and
**Phoenix.PubSub** fit here.

### Path B: compile-rewrite via `Lockstep.MixCompiler` (full pipeline)

Clone the source, run it through `Lockstep.MixCompiler.compile/1`,
and `Code.compile_file/1` the rewritten output. The library's
internal `:ets`, `:atomics`, `Process.send_after`, `GenServer.call`,
`Process.monitor`, and friends all become Lockstep sync points.

You get:

- PCT (or any other strategy) scheduling *into* the library's
  internals — handle_call dispatch ordering, monitor cleanup
  ordering, message arrival sequencing.
- A genuinely-stronger correctness signal: bugs at depth 3-6 inside
  the SUT are findable.

You pay for it with:

- Source must be cloneable and compilable. Heavy macro libraries
  (Cachex's record-style state, Hammer's `__before_compile__`)
  may need partial workarounds.
- Some OTP idioms aren't reachable: `{:via, Registry, ...}` was
  blocked until [we added rewriting](lib/lockstep/rewriter.ex)
  for the via tuple itself; `:"$ancestors"` propagation is also
  needed for libs that read `Process.get(:"$ancestors")`.

Use this when the library is small-medium (≤ 2k LOC, few transitive
deps) and you actually want to find concurrency bugs in *its* logic.

## The model-based linearizability template

The reusable shape for any read/write API:

```elixir
defmodule Lockstep.MyLibTest do
  use ExUnit.Case, async: false

  setup_all do
    # Compile through Lockstep.MixCompiler if you want path B.
    # Otherwise add the lib as a test dep and skip this.
    ...
  end

  defmodule MyModel do
    @behaviour Lockstep.Model
    alias Lockstep.History.Event

    @impl true
    def init, do: %{}  # whatever shape your reference state is

    @impl true
    def step(state, %Event{f: :get, value: k}, %Event{type: :ok, value: v}) do
      if Map.get(state, k) == v do
        {:ok, state}
      else
        {:error, "stale read"}
      end
    end

    def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok}) do
      {:ok, Map.put(state, k, v)}
    end

    # ... one clause per (op kind, completion type)

    # Generic completion catch-alls (always include these):
    def step(state, _, %Event{type: :fail}), do: {:ok, state}
    def step(state, _, %Event{type: :info}), do: {:ok, state}
  end

  test "linearizability" do
    Lockstep.Runner.run(
      &body/0,
      iterations: 200,
      strategy: :pct,
      max_steps: 5_000,
      seed: 1,
      iter_timeout: 30_000,
      progress: 25,
      suite: "mylib_lin"
    )
  end

  defp body do
    # 1. Set up a fresh SUT instance (unique name per iteration).
    {:ok, sut} = MyLib.start_link(...)

    # 2. Start a History recorder.
    history = Lockstep.History.start_link!()

    # 3. Spawn workers, each pulling from a Generator.
    n_workers = 4
    ops_per_worker = 6

    workers =
      for w <- 1..n_workers do
        Lockstep.Task.async(fn ->
          gen =
            Lockstep.Generator.weighted(
              [
                {{:get, "k"}, 4},
                {{:put, "k", w * 100}, 2},
                # ... your op pool ...
              ],
              seed: w * 1_000_003
            )
            |> Lockstep.Generator.take(ops_per_worker)

          Lockstep.Generator.each(gen, fn op -> apply_op(history, sut, op) end)
        end)
      end

    Lockstep.Task.await_many(workers, 30_000)

    # 4. Check the history against the model.
    events = Lockstep.History.events(history)

    case Lockstep.Checker.Linearizable.check(events, MyModel) do
      {:ok, _} -> :ok
      {:error, info} ->
        msg = Lockstep.Checker.Linearizable.format_violation(info)
        raise "non-linearizable history found:\n\n#{msg}"
    end
  end

  defp apply_op(history, sut, {:get, k}) do
    Lockstep.History.op(history, :get, k, fn -> MyLib.get(sut, k) end)
  end

  defp apply_op(history, sut, {:put, k, v}) do
    Lockstep.History.op(history, :put, {k, v}, fn -> MyLib.put(sut, k, v) end)
  end
end
```

`Lockstep.History.op/4` records an `:invoke` before the function, an
`:ok` (with the function's return value) on normal return, or an
`:info` (with the exception message) on raise. The
`Lockstep.Checker.Linearizable` then tries to find a serial order
that's consistent with both real-time precedence and the model.

## Writing the model

A model is a sequential reference implementation. Its job is to
answer: "given the recorded operations applied in *some* serial
order, is this completion consistent?"

### One clause per (op kind, completion shape) pair

```elixir
def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok, value: :ok}) do
  {:ok, Map.put(state, k, v)}
end

def step(state, %Event{f: :insert_new, value: {k, v}}, %Event{type: :ok, value: :ok}) do
  if Map.has_key?(state, k) do
    {:error, "insert_new returned :ok but key already exists"}
  else
    {:ok, Map.put(state, k, v)}
  end
end

def step(state, %Event{f: :insert_new, value: {k, _v}}, %Event{type: :ok, value: {:error, :already_exists}}) do
  if Map.has_key?(state, k) do
    {:ok, state}
  else
    {:error, "insert_new returned :already_exists but key didn't exist"}
  end
end
```

Each clause encodes both branches: "what state updates if this
completion is consistent" and "what's the proof it's *not*
consistent." The Linearizability checker uses this to backtrack
through serial orderings.

### Always include the catch-alls

```elixir
def step(state, _invoke, %Event{type: :fail}), do: {:ok, state}
def step(state, _invoke, %Event{type: :info}), do: {:ok, state}
```

`:fail` is an explicit "didn't apply" — the op had no effect on
state. `:info` is an unknown outcome (the worker crashed
mid-call) — for now, treat as no-apply, which is the conservative
default. (A more sophisticated checker would branch and try both.)

### Be careful with return values

Probe the library's actual return shapes before writing the model.
We had several "compile fails" or "function clause" errors traced
back to incorrect assumptions:

- `Cachex.get/2` returns `{:ok, value}`, not `value | nil`.
- `Cachex.get_and_update/3` returns `{:commit, new_v}` directly,
  *not* `{:ok, {:commit, new_v}}`.
- `Mutex.lock/2` returns `{:ok, %Mutex.Lock{}}` or `{:error, :busy}`.
- `Mutex.await/3` returns `%Mutex.Lock{}` directly (or raises).

Run a quick `mix run --eval 'IO.inspect(MyLib.foo(...))'` first.

### State scope: single-key first, multi-key second

Start with all ops on the same key. The contention is highest, the
model is simplest, and most race patterns surface there. Once that
passes, expand to multi-key — the model is just `%{key => value}`
instead of `value`.

## Picking a strategy

Lockstep ships five strategies. Rule of thumb:

| Strategy | Best at | When to skip |
|----------|---------|--------------|
| `:pct` (default) | Classic data races (lost-update, TOCTOU) at depth ≤ 3. | When you need long runs of one process picked consecutively (POS is better). |
| `:pos` | Bugs that need the strategy to consistently pick the *same* proc over many sync points (e.g., async replication patterns). | When PCT depth-bounded probability is what you need. |
| `:fair_pct` | Long-running tests where infrastructure (heartbeats, timers) would otherwise dominate. | When you need full PCT for the whole run. |
| `:idpct` | Unknown bug depth + many iterations (≥ 1000). Cycles depth in `[d_min, d_max]` per iteration. | When iteration count is small and you want predictable depth. |
| `:random` | Sanity check or cheap stress. | When you want adversarial scheduling. |

**Workflow**: start with `:pct` (default depth 3). If 200-500
iterations don't surface the bug, try `:pos`. If neither, switch to
`:idpct` and run thousands of iterations. If still nothing, the
suspected bug probably doesn't exist.

The Hammer bug was found by `:pct` at depth 3 with 5000 iterations
on a 4-worker `1 inc + 3 hit` workload. ConCache's lock manager and
nimble_pool's checkout queue have been tested to ≥100 iterations of
mixed workloads under `:pct` with no findings.

## Iteration counts and tuning

PCT's bug-finding probability per iteration is `1 / (n · k^(d-1))`
where `n` is step count, `k` is process count, `d` is bug depth.

A rough sizing table for a typical 5-worker test:

| Bug depth | Per-iter probability | Iterations for 99% confidence |
|-----------|----------------------|-------------------------------|
| 2         | ~1%                  | ~500                          |
| 3         | ~0.05%               | ~10,000                       |
| 4         | ~0.0025%             | ~200,000                      |

In practice:

- Start at **100 iterations** during development. If your test is
  flaky, fix the *test* before scaling iterations.
- Push to **1,000 iterations** for a real correctness check.
- For confidence at depth 3+, **5,000-10,000 iterations**. Each
  takes ~1ms wall-time for typical workloads, so 10k is ~10s.
- If you've gone past 50,000 and still nothing, the bug almost
  certainly isn't in your concurrency surface. Move on.

## Beyond linearizability: fault injection, time-skew

The Linearizability template covers ~80% of useful tests. The
remaining 20% need targeted directed tests.

### Fault injection (kill a process holding a resource)

```elixir
defp fault_body do
  {:ok, mutex} = Mutex.start_link([])
  parent = self()
  ref = make_ref()

  # Worker A: acquires the lock, signals, then exits without releasing.
  Lockstep.spawn(fn ->
    {:ok, _lock} = Mutex.lock(mutex, "k")
    Lockstep.send(parent, {ref, :a_acquired})
    # No release. fn returns → A exits :normal → :DOWN fires.
  end)

  Lockstep.recv_first(fn {^ref, :a_acquired} -> true; _ -> false end)

  # B (us) awaits — should block until A's :DOWN, then acquire.
  lock = Mutex.await(mutex, "k", :infinity)
  :ok = Mutex.release(mutex, lock)
end
```

Lockstep's controller delivers `:DOWN` automatically when a managed
process exits, AND treats monitor-observed deaths as expected (no
`:bug` flag). PCT explores orderings of: A's exit, the `:DOWN`
delivery to B, B's pending `await` recv, B's retry.

### Time-skew (force a timer to fire mid-operation)

Lockstep's virtual time advances when no process is runnable.
`Lockstep.sleep/1` parks the calling process and lets virtual time
tick. Combined with aggressive timer config in the SUT (e.g.,
`expiry: 0`, `cleanup_interval: 1`), you can force timer events to
interleave with concurrent ops:

```elixir
defp time_skew_body do
  # Example: a circuit-breaker style state machine with periodic cleanup.
  {:ok, _} = MyBreaker.start_link(cleanup_interval: 1)
  {:ok, _} = MyBreaker.register("api", max_attempts: 3, expiry: 0)

  # Auto-open with 3 failures.
  MyBreaker.failure("api")
  MyBreaker.failure("api")
  MyBreaker.failure("api")

  # Concurrent: cleanup tick fires + worker does another failure.
  Lockstep.spawn(fn -> MyBreaker.failure("api") end)
  Lockstep.sleep(10)  # advances virtual time, fires cleanup tick

  # Final state must be valid (state machine consistent under interleaving).
  {:ok, info} = MyBreaker.status("api")
  if info.state not in [:open, :half_open], do: raise(...)
end
```

Note: `System.system_time/1` is *not* rewritten by Lockstep. Real
wall-clock time still ticks. If your library's expiry comparison
uses `System.system_time`, you need very low `expiry` values for the
comparison to fire under fast iterations.

## What Lockstep can and cannot reach

### Reaches

- All `Lockstep.*` wrapped primitives: `:ets.*`, `:atomics.*`,
  `:persistent_term.*`, `GenServer.{call, cast}`, `Process.{monitor,
  demonitor, send_after, link, unlink, flag, alive?}`, `spawn`,
  `spawn_link`, `Lockstep.send`, `Lockstep.recv*`.
- OTP `{:via, Registry, ...}` registration (rewriter transforms the
  via tuple to `Lockstep.Registry`).
- `Process.get(:"$ancestors")` / `:"$ancestors"` is propagated on
  every spawn.

### Doesn't reach

- BEAM kernel messages: `:ETS-TRANSFER` (heir mechanism), `:nodedown`,
  port messages. These bypass the controller's mailbox. Libraries
  that depend on these (e.g., `Eternal`'s ETS HEIR) won't work
  cleanly under Lockstep without additional wrapping.
- Macro-generated code in user modules. The rewriter operates on
  source files passed to `Lockstep.MixCompiler`. If a library uses
  `__before_compile__` to inject functions into the user's module
  (Hammer does this), the user's module isn't rewritten by default.
  Workaround: bypass the macro and call the underlying algorithm
  modules directly (which *are* rewritten).
- `System.system_time/1`, `:erlang.system_time/1`, `:os.timestamp/0`.
  Real wall-clock is real wall-clock. If a library's correctness
  depends on time progression, it interacts oddly with Lockstep's
  fast iteration loop.
- NIFs other than the wrapped set. If a lib calls into a custom NIF,
  Lockstep treats it as opaque.

### Library-side compile quirks we've hit

- **Compile-time `File.read!("README.md")`**: nimble_pool's
  `@moduledoc` reads README at compile time. Rewritten file lives in
  a different cwd, so `File.read!` fails. Pre-process the rewritten
  source to replace the line.
- **Macro-heavy state records**: Cachex uses Erlang record syntax via
  macros. Compiling all of Cachex through Lockstep.MixCompiler is a
  rabbit hole; we use Path A (test-only Hex dep) instead.
- **Compile-time `Code.ensure_loaded!`**: some libraries assert
  another module is loaded at compile time. The rewritten file is
  loaded fresh, so this can fail unless dependencies load first.
  Order rewritten files by leaf-to-root in your `setup_all`.

## Anti-patterns

### Strict equality assertions where the API returns floats

We've seen rate-limiter smoke tests assert `{:allow, 3.0} = ...` —
which fails ~25% of the time because `(now - last) / interval *
refill` produces `3.0000001` if even a microsecond of real time
elapsed. Use `assert_in_delta` or `is_number/1` for float-shaped
returns.

### Single-iteration tests for known-deep bugs

Running with `iterations: 1` means PCT picks one schedule and stops.
That's fine for a directed test of a specific known interleaving
(e.g., shrinking output) but not for hunting unknown races. Default
to `iterations: 200` minimum.

### Ignoring `:info` completions in the model

The catch-all `def step(state, _, %Event{type: :info}), do: {:ok,
state}` treats unknown outcomes as no-applies. This is fine for
correctness (conservative), but if you want to detect bugs that
manifest *only* under crashed operations (e.g., the operation
applied but the caller never saw the response), you need to branch.

### Sharing state across iterations

If you `Cachex.start_link(:my_cache, ...)` once in `setup_all`, all
iterations write to the same cache. State leaks. Use a per-iteration
unique name: `name = String.to_atom("c_#{:erlang.unique_integer([:positive])}")`.

## Case studies

### Cachex (1.5M downloads): linearizability confirmed

Test: 200 iterations × 6 workers × 8 mixed ops + 500 iterations of
get_and_update R-M-W + 200 iterations multi-key. All linearizable.
Compiled un-rewritten (Path A) due to 5 transitive deps. Cachex's
basic ops are ETS-atomic; `get_and_update` serializes through a
per-cache locksmith Queue GenServer.

### lud/mutex: linearizability + fault-injection both pass

Test: 200 iterations linearizability on 5 workers × 4 lock/release
cycles + 100 iterations fault injection (kill holder, verify monitor
cleanup). Full pipeline. The fault test exercises the
`Process.monitor` cleanup path — Lockstep delivers `:DOWN`
automatically when a managed process exits.

### ConCache (~600k downloads): linearizability confirmed

Test: 100 iterations × 4 workers × 6 mixed ops on single key, with
PCT scheduling INTO ConCache.Lock's GenServer (the lock manager +
waiter queue + monitor-based cleanup). All linearizable. Required
two Lockstep core fixes:

1. AST rewriter handles `{:via, Registry, term}` → `{:via,
   Lockstep.Registry, term}` (OTP's `:via` dispatch otherwise routes
   to the real Registry which isn't started under Lockstep).
2. `Lockstep.spawn` propagates `:"$ancestors"` (ConCache.Owner reads
   it via `hd(Process.get(:"$ancestors"))` to find its parent).

### nimble_pool (used by Finch, Postgrex): clean

Test: 100 iterations × 5 client tasks × 3 checkout cycles on
pool_size 2. Full pipeline. All cycles complete. Tests the
`checkout!` queueing, `Process.monitor` on clients, and worker
selection logic.

### Hammer (~rate limiter): **bug found**

Test: 4-worker `1 inc + 3 hit` workload, 5000 iterations under PCT.
Found a real lost-update race in `Hammer.Atomic.FixWindow.inc/4`:

```elixir
case :ets.lookup(table, full_key) do
  [{_, atomic}] -> ...
  [] ->
    :ets.insert(table, {full_key, :atomics.new(2, signed: false)})  # OVERWRITES
    inc(table, key, scale, increment)
end
```

`:ets.insert` overwrites whereas `hit/5` correctly uses
`:ets.insert_new`. Under interleaved hit+inc on a fresh key,
increments to the first atomic can be clobbered when the second
process's `inc` overwrites with a fresh atomic.

This was the seventh library tested with the template — the first
to surface a real concurrency bug. The other six are correctly
designed; Hammer's mismatch between hit/5 and inc/4 is an
implementation gap, not a documented design choice.

---

## Appendix: minimal test scaffolding for compile-rewrite (Path B)

```elixir
defmodule Lockstep.MyLibTest do
  use ExUnit.Case, async: false

  @src "/tmp/mylib"
  @rewritten_dir "/tmp/mylib_lockstep"

  setup_all do
    cond do
      not File.exists?(@src) ->
        IO.puts(:stderr, "\n[mylib] skipped: clone to #{@src}")
        {:ok, available: false}

      true ->
        File.rm_rf!(@rewritten_dir)

        source_paths = [
          Path.join(@src, "lib/mylib.ex")
          # ... add all .ex files in dependency order
        ]

        case Lockstep.MixCompiler.compile(%{paths: source_paths, output: @rewritten_dir}) do
          {:ok, rewritten} ->
            # Sort leaf-to-root to satisfy module-level deps.
            ordered = Enum.sort_by(rewritten, &order_key/1)

            for path <- ordered do
              try do
                Code.compile_file(path)
                IO.puts(:stderr, "[mylib] compiled #{Path.basename(path)}")
              rescue
                e -> IO.puts(:stderr, "compile FAILED: #{Exception.message(e)}")
              end
            end

            {:ok, available: Code.ensure_loaded?(MyLib)}

          _ ->
            {:ok, available: false}
        end
    end
  end

  defp order_key(path) do
    case Path.basename(path) do
      "leaf_module.ex" -> 0
      "mid_module.ex" -> 1
      "main_module.ex" -> 2
      _ -> 99
    end
  end

  test "lin", ctx do
    if not ctx.available, do: assert(true), else: lin_test()
  end

  defp lin_test do
    Lockstep.Runner.run(&body/0,
      iterations: 200,
      strategy: :pct,
      max_steps: 5_000,
      seed: 1,
      iter_timeout: 30_000,
      progress: 25,
      suite: "mylib_lin"
    )
  end

  defp body, do: ...
end
```