---
name: lockstep
description: |
Use this skill when the user wants to find or verify concurrency
bugs in BEAM code (Elixir, Erlang, Gleam) using Lockstep, the
controlled-scheduling test framework. Triggers include: "test for
races", "verify this is concurrency-safe", "find the schedule that
causes this flaky test", "why does this fail occasionally", or any
mention of GenServer/ETS/atomics race conditions.
---
# Lockstep — controlled concurrency testing for the BEAM
Lockstep runs an ExUnit test body many times with different
message-passing schedules. When it finds a bug, the schedule is
deterministic and replayable — same `seed` + same `iterations`
always produce the same trace.
## When to use Lockstep
| Situation | Use Lockstep? |
|-----------|---------------|
| "This test fails 1 in 50 runs in CI" | **Yes** — Lockstep gives you the schedule |
| "I'm worried about a TOCTTOU race" | **Yes** — POS strategy is good at these |
| "Two GenServers race over shared state" | **Yes** |
| "Is this single-pure-function correct?" | No — use property tests |
| "I have a logic bug" | No — use careful code review |
| "Test under network partition" | **Yes** — `Lockstep.Cluster.partition/3` |
Lockstep's strength: **schedule-dependent bugs** where standard
testing finds them rarely or not at all. Lockstep gives you a
reproducible counterexample with a `seed` — re-running with the
same seed always reproduces.
## Adding to a project
```elixir
# mix.exs
defp deps do
[{:lockstep, "~> 0.1.0", only: :test}]
end
```
## Writing your first Lockstep test
The simplest pattern: take an existing ExUnit test and rewrite the
body to use Lockstep wrappers, then wrap it with `Lockstep.Test`.
```elixir
defmodule MyApp.RaceTest do
use Lockstep.Test
defmodule Counter do
use Lockstep.GenServer # NOTE: Lockstep.GenServer, not GenServer
def start_link, do: Lockstep.GenServer.start_link(__MODULE__, 0)
def value(pid), do: Lockstep.GenServer.call(pid, :value)
def add(pid, n), do: Lockstep.GenServer.call(pid, {:add, n})
def init(state), do: {:ok, state}
def handle_call(:value, _, n), do: {:reply, n, n}
def handle_call({:add, n}, _, total), do: {:reply, :ok, total + n}
end
ctest "two clients adding 1 each end at 2" do
{:ok, pid} = Counter.start_link()
parent = self()
for _ <- 1..2 do
Lockstep.spawn(fn ->
# The "buggy" RMW: read value, then add 1
# If both read 0 before either writes, both write 1, total = 1.
v = Counter.value(pid)
Counter.add(pid, v + 1 - v) # spelled out to be obviously buggy
Lockstep.send(parent, :done)
end)
end
for _ <- 1..2, do: Lockstep.recv_first(fn :done -> true; _ -> false end)
final = Counter.value(pid)
if final != 2, do: raise "lost update; counter is #{final}"
end
end
```
Run with:
```sh
mix test path/to/race_test.exs
```
If Lockstep finds a bug, you'll see:
```
** (Lockstep.BugFound)
Lockstep found a concurrency bug on iteration 4.
seed: 1
strategy: :pct
trace path: traces/<test-name>-iter4-seed1.lockstep
Schedule:
step 1 hello P0(root)
step 2 spawn P0(root) -> P1
...
step 14 exit P0(root) reason={...} <-- FAILED HERE
Replay with:
mix lockstep.replay --trace traces/<test-name>-iter4-seed1.lockstep
```
## Strategy choice
- **`:pct`** (default) — Probabilistic Concurrency Testing. Best
for coarse-grained interleaving exploration.
- **`:pos`** — Probabilistic Operating System. Best for tight
read-modify-write races on shared atomics/ETS.
- **`:fair_pct`** — PCT then random; protects against starvation in
spin loops.
- **`:random`** — Pure random scheduling. Baseline.
```elixir
ctest "race", strategy: :pos, iterations: 1000 do
# ...
end
```
## OTP wrappers — drop-in replacements
| OTP module | Lockstep equivalent |
|------------|---------------------|
| `GenServer` | `Lockstep.GenServer` |
| `:gen_statem` | `Lockstep.GenStatem` |
| `Agent`, `Task`, `Task.Supervisor` | `Lockstep.{Agent,Task,Task.Supervisor}` |
| `Registry`, `Supervisor` | `Lockstep.{Registry,Supervisor}` |
| `send/2`, `spawn/1`, `Process.send_after/3` | `Lockstep.{send,spawn,send_after}` |
| `:ets.{insert,lookup,update_counter}` | `Lockstep.ETS.*` |
| `:atomics.*`, `:persistent_term.*` | `Lockstep.{Atomics,PersistentTerm}.*` |
The semantic of every wrapper is identical to the underlying OTP
function — Lockstep just inserts a sync point so the strategy can
interleave between operations.
## Replay + shrink
When Lockstep finds a bug, it writes a `.lockstep` trace file. You
can:
```sh
# Re-execute the exact failing schedule (deterministic)
mix lockstep.replay --trace traces/<bug>.lockstep
# Minimize the trace to the smallest reproducing schedule
mix lockstep.shrink --trace traces/<bug>.lockstep
```
Replay lets you attach a debugger / add `IO.inspect` and step
through the race. Shrinking turns a 5000-step trace into 12 steps.
## Multi-node testing (`Lockstep.Cluster`)
For testing distributed systems:
```elixir
ctest "partition + heal" do
[a, b, c] = Lockstep.Cluster.start_nodes([:a, :b, :c])
Lockstep.Cluster.run(a, fn -> MyService.start_link() end)
Lockstep.Cluster.run(b, fn -> MyService.start_link() end)
Lockstep.Cluster.run(c, fn -> MyService.start_link() end)
Lockstep.Cluster.partition([a, b], [c], mode: :defer)
# ... do work in each partition ...
Lockstep.Cluster.heal()
# Verify convergence
end
```
Also: `Lockstep.Cluster.stop_node/1` and `start_node/1` for
crash/recovery scenarios.
## What Lockstep does NOT do
- It doesn't find logic bugs visible by reading source. Use code
review.
- It doesn't replace property-based testing. They're complementary.
- It doesn't simulate disk fsync, network packet drops at the
byte-level, or OS-level kill -9 (use chaos engineering for those).
- It doesn't model wall-clock-tight latency requirements.
## Common patterns to recognize
These are the bug shapes Lockstep finds well:
### TOCTTOU (read-then-act)
```elixir
def try_acquire({ref, limit}) do
current = :atomics.get(ref, 1) # T = check
if current < limit do
:atomics.add(ref, 1, 1) # O = of-use; another caller can squeeze in
:ok
else
{:error, :limit_exceeded}
end
end
```
→ test 4+ concurrent callers; under POS, found at iteration ~1.
### Lost update on read-modify-write
```elixir
v = Counter.value(pid)
Counter.set(pid, v + 1)
```
→ test multiple concurrent processes; under POS/PCT, found at
iteration 1-3.
### Message-ordering race
A NeighborReply arrives at the same mailbox as a connection_lost
signal. Whichever is processed first determines outcome.
→ Lockstep.send + Process.monitor + handle_info patterns.
### Linearizability violations
Use `Lockstep.Checker.Linearizable`:
```elixir
ctest "registry is linearizable" do
history = run_workload(...)
assert :ok = Lockstep.Checker.Linearizable.check(history, model)
end
```
## Reading traces
Trace output uses pid aliases for readability:
```
P0(root) = #PID<0.123.0>
P1 = #PID<0.124.0>
P2 = #PID<0.125.0>
Schedule:
step 1 hello P0(root)
step 2 spawn P0(root) -> P1
step 3 send P0(root) -> P1 {:hello}
step 4 recv P1 {:hello}
step 5 exit P1 reason={:exception, :error, ...} <-- FAILED HERE
```
The `<-- FAILED HERE` marker shows where the assertion or
invariant fired. Read the trace bottom-up to understand causality.
## Causal slice
Lockstep automatically slices traces to show only events causally
related to the failure. Set `LOCKSTEP_NO_CAUSAL_SLICE=1` to disable.
For long traces, the slice is ~5-20% of the original.
## LLM-explained counterexamples
If you have an Anthropic API key:
```sh
export ANTHROPIC_API_KEY=sk-ant-...
mix test # Failures will be explained in plain English
```
Set `LOCKSTEP_LLM_OFF=1` to disable.
## Documentation
- **Overview + tutorials**: `README.md`
- **Methodology**: `METHODOLOGY.md` — playbook for testing real
Hex packages
- **Bug-finding case studies**: `docs/design/BUG_FINDINGS.md`
- **Design history + strategy**: `docs/design/`
## Programmatic API
For non-ExUnit callers:
```elixir
Lockstep.Runner.run(
fn -> my_test_body() end,
iterations: 1000,
strategy: :pos,
max_steps: 5_000,
seed: 1,
iter_timeout: 30_000,
suite: "my_suite"
)
```
Returns `:ok` or raises `Lockstep.BugFound` with iteration, seed,
strategy, and trace path.
## When to escalate
If a bug is reproducible with `seed: N` but doesn't reproduce with
other seeds, **the seed matters** — file the bug with that exact
seed. Maintainers can verify with the same seed and fix-cycle is
deterministic.
If a test ALWAYS produces a "bug" at iteration 1 but you don't
believe it's a real bug, the test scenario is likely too aggressive
(forcing the race deterministically rather than testing whether the
race exists). Restructure the test to be more "natural" usage.