# Lockstep testing methodology — the playbook
How to apply Lockstep to a real BEAM library to find (or rule out)
concurrency bugs. This is the field guide we wrote while testing
Cachex, ConCache, lud/mutex, nimble_pool, and Hammer.
The bug-finding rate so far: **1 in 5 libraries**. Hammer's
`Hammer.Atomic.FixWindow.inc/4` had a real lost-update race against
`hit/5`. The other four are concurrency-correct under their public
APIs — which is itself useful information.
This doc is the recipe for getting to that signal.
## Contents
1. [When to reach for Lockstep](#when-to-reach-for-lockstep)
2. [Two paths: compile-rewrite vs test-only dep](#two-paths-compile-rewrite-vs-test-only-dep)
3. [The model-based linearizability template](#the-model-based-linearizability-template)
4. [Writing the model](#writing-the-model)
5. [Picking a strategy](#picking-a-strategy)
6. [Iteration counts and tuning](#iteration-counts-and-tuning)
7. [Beyond linearizability: fault injection, time-skew](#beyond-linearizability-fault-injection-time-skew)
8. [What Lockstep can and cannot reach](#what-lockstep-can-and-cannot-reach)
9. [Anti-patterns](#anti-patterns)
10. [Case studies](#case-studies)
## When to reach for Lockstep
Lockstep makes sense when you're testing code where:
- Multiple processes interact through shared mutable state (ETS,
atomics, persistent_term, registered names).
- You need a *reproducible* failing schedule — vanilla concurrent
ExUnit tests find races but throw away the trace.
- You suspect a race exists at non-trivial depth (3+ scheduling
decisions away from a clear "first send wins").
It's overkill when:
- The code is sequential or has only one writer.
- You just want stress testing — `Task.async_stream` in a loop is
cheaper.
- The race is already easy to catch (tight `:erlang.spawn` loops
catch lost-updates 99% of the time, see the README's "Why bother").
## Two paths: compile-rewrite vs test-only dep
There are two ways to point Lockstep at a library, with very
different tradeoffs.
### Path A: test-only Hex dep (workers controlled, library raw)
Add the library to `mix.exs` as `only: :test`. Workers run as
`Lockstep.Task.async/1` (controlled), call into the library, and
record into `Lockstep.History`. The library itself runs un-rewritten
on the real BEAM scheduler.
You get:
- Real-world concurrent behavior of the library (real scheduler).
- Lockstep manages worker scheduling and history recording.
- Linearizability checking against your model.
You don't get:
- PCT scheduling *into* the library's GenServer mailbox dispatch.
Internal races aren't actively explored — you're hoping the BEAM
scheduler hits them naturally.
Use this when the library is too heavy to compile-rewrite (heavy
macros, many transitive deps, kernel-level features). **Cachex** and
**Phoenix.PubSub** fit here.
### Path B: compile-rewrite via `Lockstep.MixCompiler` (full pipeline)
Clone the source, run it through `Lockstep.MixCompiler.compile/1`,
and `Code.compile_file/1` the rewritten output. The library's
internal `:ets`, `:atomics`, `Process.send_after`, `GenServer.call`,
`Process.monitor`, and friends all become Lockstep sync points.
You get:
- PCT (or any other strategy) scheduling *into* the library's
internals — handle_call dispatch ordering, monitor cleanup
ordering, message arrival sequencing.
- A genuinely-stronger correctness signal: bugs at depth 3-6 inside
the SUT are findable.
You pay for it with:
- Source must be cloneable and compilable. Heavy macro libraries
(Cachex's record-style state, Hammer's `__before_compile__`)
may need partial workarounds.
- Some OTP idioms aren't reachable: `{:via, Registry, ...}` was
blocked until [we added rewriting](lib/lockstep/rewriter.ex)
for the via tuple itself; `:"$ancestors"` propagation is also
needed for libs that read `Process.get(:"$ancestors")`.
Use this when the library is small-medium (≤ 2k LOC, few transitive
deps) and you actually want to find concurrency bugs in *its* logic.
## The model-based linearizability template
The reusable shape for any read/write API:
```elixir
defmodule Lockstep.MyLibTest do
use ExUnit.Case, async: false
setup_all do
# Compile through Lockstep.MixCompiler if you want path B.
# Otherwise add the lib as a test dep and skip this.
...
end
defmodule MyModel do
@behaviour Lockstep.Model
alias Lockstep.History.Event
@impl true
def init, do: %{} # whatever shape your reference state is
@impl true
def step(state, %Event{f: :get, value: k}, %Event{type: :ok, value: v}) do
if Map.get(state, k) == v do
{:ok, state}
else
{:error, "stale read"}
end
end
def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok}) do
{:ok, Map.put(state, k, v)}
end
# ... one clause per (op kind, completion type)
# Generic completion catch-alls (always include these):
def step(state, _, %Event{type: :fail}), do: {:ok, state}
def step(state, _, %Event{type: :info}), do: {:ok, state}
end
test "linearizability" do
Lockstep.Runner.run(
&body/0,
iterations: 200,
strategy: :pct,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
progress: 25,
suite: "mylib_lin"
)
end
defp body do
# 1. Set up a fresh SUT instance (unique name per iteration).
{:ok, sut} = MyLib.start_link(...)
# 2. Start a History recorder.
history = Lockstep.History.start_link!()
# 3. Spawn workers, each pulling from a Generator.
n_workers = 4
ops_per_worker = 6
workers =
for w <- 1..n_workers do
Lockstep.Task.async(fn ->
gen =
Lockstep.Generator.weighted(
[
{{:get, "k"}, 4},
{{:put, "k", w * 100}, 2},
# ... your op pool ...
],
seed: w * 1_000_003
)
|> Lockstep.Generator.take(ops_per_worker)
Lockstep.Generator.each(gen, fn op -> apply_op(history, sut, op) end)
end)
end
Lockstep.Task.await_many(workers, 30_000)
# 4. Check the history against the model.
events = Lockstep.History.events(history)
case Lockstep.Checker.Linearizable.check(events, MyModel) do
{:ok, _} -> :ok
{:error, info} ->
msg = Lockstep.Checker.Linearizable.format_violation(info)
raise "non-linearizable history found:\n\n#{msg}"
end
end
defp apply_op(history, sut, {:get, k}) do
Lockstep.History.op(history, :get, k, fn -> MyLib.get(sut, k) end)
end
defp apply_op(history, sut, {:put, k, v}) do
Lockstep.History.op(history, :put, {k, v}, fn -> MyLib.put(sut, k, v) end)
end
end
```
`Lockstep.History.op/4` records an `:invoke` before the function, an
`:ok` (with the function's return value) on normal return, or an
`:info` (with the exception message) on raise. The
`Lockstep.Checker.Linearizable` then tries to find a serial order
that's consistent with both real-time precedence and the model.
## Writing the model
A model is a sequential reference implementation. Its job is to
answer: "given the recorded operations applied in *some* serial
order, is this completion consistent?"
### One clause per (op kind, completion shape) pair
```elixir
def step(state, %Event{f: :put, value: {k, v}}, %Event{type: :ok, value: :ok}) do
{:ok, Map.put(state, k, v)}
end
def step(state, %Event{f: :insert_new, value: {k, v}}, %Event{type: :ok, value: :ok}) do
if Map.has_key?(state, k) do
{:error, "insert_new returned :ok but key already exists"}
else
{:ok, Map.put(state, k, v)}
end
end
def step(state, %Event{f: :insert_new, value: {k, _v}}, %Event{type: :ok, value: {:error, :already_exists}}) do
if Map.has_key?(state, k) do
{:ok, state}
else
{:error, "insert_new returned :already_exists but key didn't exist"}
end
end
```
Each clause encodes both branches: "what state updates if this
completion is consistent" and "what's the proof it's *not*
consistent." The Linearizability checker uses this to backtrack
through serial orderings.
### Always include the catch-alls
```elixir
def step(state, _invoke, %Event{type: :fail}), do: {:ok, state}
def step(state, _invoke, %Event{type: :info}), do: {:ok, state}
```
`:fail` is an explicit "didn't apply" — the op had no effect on
state. `:info` is an unknown outcome (the worker crashed
mid-call) — for now, treat as no-apply, which is the conservative
default. (A more sophisticated checker would branch and try both.)
### Be careful with return values
Probe the library's actual return shapes before writing the model.
We had several "compile fails" or "function clause" errors traced
back to incorrect assumptions:
- `Cachex.get/2` returns `{:ok, value}`, not `value | nil`.
- `Cachex.get_and_update/3` returns `{:commit, new_v}` directly,
*not* `{:ok, {:commit, new_v}}`.
- `Mutex.lock/2` returns `{:ok, %Mutex.Lock{}}` or `{:error, :busy}`.
- `Mutex.await/3` returns `%Mutex.Lock{}` directly (or raises).
Run a quick `mix run --eval 'IO.inspect(MyLib.foo(...))'` first.
### State scope: single-key first, multi-key second
Start with all ops on the same key. The contention is highest, the
model is simplest, and most race patterns surface there. Once that
passes, expand to multi-key — the model is just `%{key => value}`
instead of `value`.
## Picking a strategy
Lockstep ships five strategies. Rule of thumb:
| Strategy | Best at | When to skip |
|----------|---------|--------------|
| `:pct` (default) | Classic data races (lost-update, TOCTOU) at depth ≤ 3. | When you need long runs of one process picked consecutively (POS is better). |
| `:pos` | Bugs that need the strategy to consistently pick the *same* proc over many sync points (e.g., async replication patterns). | When PCT depth-bounded probability is what you need. |
| `:fair_pct` | Long-running tests where infrastructure (heartbeats, timers) would otherwise dominate. | When you need full PCT for the whole run. |
| `:idpct` | Unknown bug depth + many iterations (≥ 1000). Cycles depth in `[d_min, d_max]` per iteration. | When iteration count is small and you want predictable depth. |
| `:random` | Sanity check or cheap stress. | When you want adversarial scheduling. |
**Workflow**: start with `:pct` (default depth 3). If 200-500
iterations don't surface the bug, try `:pos`. If neither, switch to
`:idpct` and run thousands of iterations. If still nothing, the
suspected bug probably doesn't exist.
The Hammer bug was found by `:pct` at depth 3 with 5000 iterations
on a 4-worker `1 inc + 3 hit` workload. ConCache's lock manager and
nimble_pool's checkout queue have been tested to ≥100 iterations of
mixed workloads under `:pct` with no findings.
## Iteration counts and tuning
PCT's bug-finding probability per iteration is `1 / (n · k^(d-1))`
where `n` is step count, `k` is process count, `d` is bug depth.
A rough sizing table for a typical 5-worker test:
| Bug depth | Per-iter probability | Iterations for 99% confidence |
|-----------|----------------------|-------------------------------|
| 2 | ~1% | ~500 |
| 3 | ~0.05% | ~10,000 |
| 4 | ~0.0025% | ~200,000 |
In practice:
- Start at **100 iterations** during development. If your test is
flaky, fix the *test* before scaling iterations.
- Push to **1,000 iterations** for a real correctness check.
- For confidence at depth 3+, **5,000-10,000 iterations**. Each
takes ~1ms wall-time for typical workloads, so 10k is ~10s.
- If you've gone past 50,000 and still nothing, the bug almost
certainly isn't in your concurrency surface. Move on.
## Beyond linearizability: fault injection, time-skew
The Linearizability template covers ~80% of useful tests. The
remaining 20% need targeted directed tests.
### Fault injection (kill a process holding a resource)
```elixir
defp fault_body do
{:ok, mutex} = Mutex.start_link([])
parent = self()
ref = make_ref()
# Worker A: acquires the lock, signals, then exits without releasing.
Lockstep.spawn(fn ->
{:ok, _lock} = Mutex.lock(mutex, "k")
Lockstep.send(parent, {ref, :a_acquired})
# No release. fn returns → A exits :normal → :DOWN fires.
end)
Lockstep.recv_first(fn {^ref, :a_acquired} -> true; _ -> false end)
# B (us) awaits — should block until A's :DOWN, then acquire.
lock = Mutex.await(mutex, "k", :infinity)
:ok = Mutex.release(mutex, lock)
end
```
Lockstep's controller delivers `:DOWN` automatically when a managed
process exits, AND treats monitor-observed deaths as expected (no
`:bug` flag). PCT explores orderings of: A's exit, the `:DOWN`
delivery to B, B's pending `await` recv, B's retry.
### Time-skew (force a timer to fire mid-operation)
Lockstep's virtual time advances when no process is runnable.
`Lockstep.sleep/1` parks the calling process and lets virtual time
tick. Combined with aggressive timer config in the SUT (e.g.,
`expiry: 0`, `cleanup_interval: 1`), you can force timer events to
interleave with concurrent ops:
```elixir
defp time_skew_body do
# Example: a circuit-breaker style state machine with periodic cleanup.
{:ok, _} = MyBreaker.start_link(cleanup_interval: 1)
{:ok, _} = MyBreaker.register("api", max_attempts: 3, expiry: 0)
# Auto-open with 3 failures.
MyBreaker.failure("api")
MyBreaker.failure("api")
MyBreaker.failure("api")
# Concurrent: cleanup tick fires + worker does another failure.
Lockstep.spawn(fn -> MyBreaker.failure("api") end)
Lockstep.sleep(10) # advances virtual time, fires cleanup tick
# Final state must be valid (state machine consistent under interleaving).
{:ok, info} = MyBreaker.status("api")
if info.state not in [:open, :half_open], do: raise(...)
end
```
Note: `System.system_time/1` is *not* rewritten by Lockstep. Real
wall-clock time still ticks. If your library's expiry comparison
uses `System.system_time`, you need very low `expiry` values for the
comparison to fire under fast iterations.
## What Lockstep can and cannot reach
### Reaches
- All `Lockstep.*` wrapped primitives: `:ets.*`, `:atomics.*`,
`:persistent_term.*`, `GenServer.{call, cast}`, `Process.{monitor,
demonitor, send_after, link, unlink, flag, alive?}`, `spawn`,
`spawn_link`, `Lockstep.send`, `Lockstep.recv*`.
- OTP `{:via, Registry, ...}` registration (rewriter transforms the
via tuple to `Lockstep.Registry`).
- `Process.get(:"$ancestors")` / `:"$ancestors"` is propagated on
every spawn.
### Doesn't reach
- BEAM kernel messages: `:ETS-TRANSFER` (heir mechanism), `:nodedown`,
port messages. These bypass the controller's mailbox. Libraries
that depend on these (e.g., `Eternal`'s ETS HEIR) won't work
cleanly under Lockstep without additional wrapping.
- Macro-generated code in user modules. The rewriter operates on
source files passed to `Lockstep.MixCompiler`. If a library uses
`__before_compile__` to inject functions into the user's module
(Hammer does this), the user's module isn't rewritten by default.
Workaround: bypass the macro and call the underlying algorithm
modules directly (which *are* rewritten).
- `System.system_time/1`, `:erlang.system_time/1`, `:os.timestamp/0`.
Real wall-clock is real wall-clock. If a library's correctness
depends on time progression, it interacts oddly with Lockstep's
fast iteration loop.
- NIFs other than the wrapped set. If a lib calls into a custom NIF,
Lockstep treats it as opaque.
### Library-side compile quirks we've hit
- **Compile-time `File.read!("README.md")`**: nimble_pool's
`@moduledoc` reads README at compile time. Rewritten file lives in
a different cwd, so `File.read!` fails. Pre-process the rewritten
source to replace the line.
- **Macro-heavy state records**: Cachex uses Erlang record syntax via
macros. Compiling all of Cachex through Lockstep.MixCompiler is a
rabbit hole; we use Path A (test-only Hex dep) instead.
- **Compile-time `Code.ensure_loaded!`**: some libraries assert
another module is loaded at compile time. The rewritten file is
loaded fresh, so this can fail unless dependencies load first.
Order rewritten files by leaf-to-root in your `setup_all`.
## Anti-patterns
### Strict equality assertions where the API returns floats
We've seen rate-limiter smoke tests assert `{:allow, 3.0} = ...` —
which fails ~25% of the time because `(now - last) / interval *
refill` produces `3.0000001` if even a microsecond of real time
elapsed. Use `assert_in_delta` or `is_number/1` for float-shaped
returns.
### Single-iteration tests for known-deep bugs
Running with `iterations: 1` means PCT picks one schedule and stops.
That's fine for a directed test of a specific known interleaving
(e.g., shrinking output) but not for hunting unknown races. Default
to `iterations: 200` minimum.
### Ignoring `:info` completions in the model
The catch-all `def step(state, _, %Event{type: :info}), do: {:ok,
state}` treats unknown outcomes as no-applies. This is fine for
correctness (conservative), but if you want to detect bugs that
manifest *only* under crashed operations (e.g., the operation
applied but the caller never saw the response), you need to branch.
### Sharing state across iterations
If you `Cachex.start_link(:my_cache, ...)` once in `setup_all`, all
iterations write to the same cache. State leaks. Use a per-iteration
unique name: `name = String.to_atom("c_#{:erlang.unique_integer([:positive])}")`.
## Case studies
### Cachex (1.5M downloads): linearizability confirmed
Test: 200 iterations × 6 workers × 8 mixed ops + 500 iterations of
get_and_update R-M-W + 200 iterations multi-key. All linearizable.
Compiled un-rewritten (Path A) due to 5 transitive deps. Cachex's
basic ops are ETS-atomic; `get_and_update` serializes through a
per-cache locksmith Queue GenServer.
### lud/mutex: linearizability + fault-injection both pass
Test: 200 iterations linearizability on 5 workers × 4 lock/release
cycles + 100 iterations fault injection (kill holder, verify monitor
cleanup). Full pipeline. The fault test exercises the
`Process.monitor` cleanup path — Lockstep delivers `:DOWN`
automatically when a managed process exits.
### ConCache (~600k downloads): linearizability confirmed
Test: 100 iterations × 4 workers × 6 mixed ops on single key, with
PCT scheduling INTO ConCache.Lock's GenServer (the lock manager +
waiter queue + monitor-based cleanup). All linearizable. Required
two Lockstep core fixes:
1. AST rewriter handles `{:via, Registry, term}` → `{:via,
Lockstep.Registry, term}` (OTP's `:via` dispatch otherwise routes
to the real Registry which isn't started under Lockstep).
2. `Lockstep.spawn` propagates `:"$ancestors"` (ConCache.Owner reads
it via `hd(Process.get(:"$ancestors"))` to find its parent).
### nimble_pool (used by Finch, Postgrex): clean
Test: 100 iterations × 5 client tasks × 3 checkout cycles on
pool_size 2. Full pipeline. All cycles complete. Tests the
`checkout!` queueing, `Process.monitor` on clients, and worker
selection logic.
### Hammer (~rate limiter): **bug found**
Test: 4-worker `1 inc + 3 hit` workload, 5000 iterations under PCT.
Found a real lost-update race in `Hammer.Atomic.FixWindow.inc/4`:
```elixir
case :ets.lookup(table, full_key) do
[{_, atomic}] -> ...
[] ->
:ets.insert(table, {full_key, :atomics.new(2, signed: false)}) # OVERWRITES
inc(table, key, scale, increment)
end
```
`:ets.insert` overwrites whereas `hit/5` correctly uses
`:ets.insert_new`. Under interleaved hit+inc on a fresh key,
increments to the first atomic can be clobbered when the second
process's `inc` overwrites with a fresh atomic.
This was the seventh library tested with the template — the first
to surface a real concurrency bug. The other six are correctly
designed; Hammer's mismatch between hit/5 and inc/4 is an
implementation gap, not a documented design choice.
---
## Appendix: minimal test scaffolding for compile-rewrite (Path B)
```elixir
defmodule Lockstep.MyLibTest do
use ExUnit.Case, async: false
@src "/tmp/mylib"
@rewritten_dir "/tmp/mylib_lockstep"
setup_all do
cond do
not File.exists?(@src) ->
IO.puts(:stderr, "\n[mylib] skipped: clone to #{@src}")
{:ok, available: false}
true ->
File.rm_rf!(@rewritten_dir)
source_paths = [
Path.join(@src, "lib/mylib.ex")
# ... add all .ex files in dependency order
]
case Lockstep.MixCompiler.compile(%{paths: source_paths, output: @rewritten_dir}) do
{:ok, rewritten} ->
# Sort leaf-to-root to satisfy module-level deps.
ordered = Enum.sort_by(rewritten, &order_key/1)
for path <- ordered do
try do
Code.compile_file(path)
IO.puts(:stderr, "[mylib] compiled #{Path.basename(path)}")
rescue
e -> IO.puts(:stderr, "compile FAILED: #{Exception.message(e)}")
end
end
{:ok, available: Code.ensure_loaded?(MyLib)}
_ ->
{:ok, available: false}
end
end
end
defp order_key(path) do
case Path.basename(path) do
"leaf_module.ex" -> 0
"mid_module.ex" -> 1
"main_module.ex" -> 2
_ -> 99
end
end
test "lin", ctx do
if not ctx.available, do: assert(true), else: lin_test()
end
defp lin_test do
Lockstep.Runner.run(&body/0,
iterations: 200,
strategy: :pct,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
progress: 25,
suite: "mylib_lin"
)
end
defp body, do: ...
end
```