skill/lockstep/SKILL.md

Select File:
skill/lockstep/SKILL.md

---
name: lockstep
description: |
  Use this skill when the user wants to find or verify concurrency
  bugs in BEAM code (Elixir, Erlang, Gleam) using Lockstep, the
  controlled-scheduling test framework. Triggers include: "test for
  races", "verify this is concurrency-safe", "find the schedule that
  causes this flaky test", "why does this fail occasionally", or any
  mention of GenServer/ETS/atomics race conditions.
---

# Lockstep — controlled concurrency testing for the BEAM

Lockstep runs an ExUnit test body many times with different
message-passing schedules. When it finds a bug, the schedule is
deterministic and replayable — same `seed` + same `iterations`
always produce the same trace.

## When to use Lockstep

| Situation | Use Lockstep? |
|-----------|---------------|
| "This test fails 1 in 50 runs in CI" | **Yes** — Lockstep gives you the schedule |
| "I'm worried about a TOCTTOU race" | **Yes** — POS strategy is good at these |
| "Two GenServers race over shared state" | **Yes** |
| "Is this single-pure-function correct?" | No — use property tests |
| "I have a logic bug" | No — use careful code review |
| "Test under network partition" | **Yes** — `Lockstep.Cluster.partition/3` |

Lockstep's strength: **schedule-dependent bugs** where standard
testing finds them rarely or not at all. Lockstep gives you a
reproducible counterexample with a `seed` — re-running with the
same seed always reproduces.

## Adding to a project

```elixir
# mix.exs
defp deps do
  [{:lockstep, "~> 0.1.0", only: :test}]
end
```

## Writing your first Lockstep test

The simplest pattern: take an existing ExUnit test and rewrite the
body to use Lockstep wrappers, then wrap it with `Lockstep.Test`.

```elixir
defmodule MyApp.RaceTest do
  use Lockstep.Test

  defmodule Counter do
    use Lockstep.GenServer  # NOTE: Lockstep.GenServer, not GenServer

    def start_link, do: Lockstep.GenServer.start_link(__MODULE__, 0)
    def value(pid),    do: Lockstep.GenServer.call(pid, :value)
    def add(pid, n),   do: Lockstep.GenServer.call(pid, {:add, n})

    def init(state), do: {:ok, state}
    def handle_call(:value, _, n), do: {:reply, n, n}
    def handle_call({:add, n}, _, total), do: {:reply, :ok, total + n}
  end

  ctest "two clients adding 1 each end at 2" do
    {:ok, pid} = Counter.start_link()
    parent = self()

    for _ <- 1..2 do
      Lockstep.spawn(fn ->
        # The "buggy" RMW: read value, then add 1
        # If both read 0 before either writes, both write 1, total = 1.
        v = Counter.value(pid)
        Counter.add(pid, v + 1 - v)  # spelled out to be obviously buggy
        Lockstep.send(parent, :done)
      end)
    end

    for _ <- 1..2, do: Lockstep.recv_first(fn :done -> true; _ -> false end)

    final = Counter.value(pid)
    if final != 2, do: raise "lost update; counter is #{final}"
  end
end
```

Run with:

```sh
mix test path/to/race_test.exs
```

If Lockstep finds a bug, you'll see:

```
** (Lockstep.BugFound)
Lockstep found a concurrency bug on iteration 4.
  seed: 1
  strategy: :pct
  trace path: traces/<test-name>-iter4-seed1.lockstep

Schedule:
  step 1  hello   P0(root)
  step 2  spawn   P0(root) -> P1
  ...
  step 14 exit    P0(root) reason={...} <-- FAILED HERE

Replay with:
  mix lockstep.replay --trace traces/<test-name>-iter4-seed1.lockstep
```

## Strategy choice

- **`:pct`** (default) — Probabilistic Concurrency Testing. Best
  for coarse-grained interleaving exploration.
- **`:pos`** — Probabilistic Operating System. Best for tight
  read-modify-write races on shared atomics/ETS.
- **`:fair_pct`** — PCT then random; protects against starvation in
  spin loops.
- **`:random`** — Pure random scheduling. Baseline.

```elixir
ctest "race", strategy: :pos, iterations: 1000 do
  # ...
end
```

## OTP wrappers — drop-in replacements

| OTP module | Lockstep equivalent |
|------------|---------------------|
| `GenServer` | `Lockstep.GenServer` |
| `:gen_statem` | `Lockstep.GenStatem` |
| `Agent`, `Task`, `Task.Supervisor` | `Lockstep.{Agent,Task,Task.Supervisor}` |
| `Registry`, `Supervisor` | `Lockstep.{Registry,Supervisor}` |
| `send/2`, `spawn/1`, `Process.send_after/3` | `Lockstep.{send,spawn,send_after}` |
| `:ets.{insert,lookup,update_counter}` | `Lockstep.ETS.*` |
| `:atomics.*`, `:persistent_term.*` | `Lockstep.{Atomics,PersistentTerm}.*` |

The semantic of every wrapper is identical to the underlying OTP
function — Lockstep just inserts a sync point so the strategy can
interleave between operations.

## Replay + shrink

When Lockstep finds a bug, it writes a `.lockstep` trace file. You
can:

```sh
# Re-execute the exact failing schedule (deterministic)
mix lockstep.replay --trace traces/<bug>.lockstep

# Minimize the trace to the smallest reproducing schedule
mix lockstep.shrink --trace traces/<bug>.lockstep
```

Replay lets you attach a debugger / add `IO.inspect` and step
through the race. Shrinking turns a 5000-step trace into 12 steps.

## Multi-node testing (`Lockstep.Cluster`)

For testing distributed systems:

```elixir
ctest "partition + heal" do
  [a, b, c] = Lockstep.Cluster.start_nodes([:a, :b, :c])

  Lockstep.Cluster.run(a, fn -> MyService.start_link() end)
  Lockstep.Cluster.run(b, fn -> MyService.start_link() end)
  Lockstep.Cluster.run(c, fn -> MyService.start_link() end)

  Lockstep.Cluster.partition([a, b], [c], mode: :defer)
  # ... do work in each partition ...
  Lockstep.Cluster.heal()

  # Verify convergence
end
```

Also: `Lockstep.Cluster.stop_node/1` and `start_node/1` for
crash/recovery scenarios.

## What Lockstep does NOT do

- It doesn't find logic bugs visible by reading source. Use code
  review.
- It doesn't replace property-based testing. They're complementary.
- It doesn't simulate disk fsync, network packet drops at the
  byte-level, or OS-level kill -9 (use chaos engineering for those).
- It doesn't model wall-clock-tight latency requirements.

## Common patterns to recognize

These are the bug shapes Lockstep finds well:

### TOCTTOU (read-then-act)

```elixir
def try_acquire({ref, limit}) do
  current = :atomics.get(ref, 1)        # T = check
  if current < limit do
    :atomics.add(ref, 1, 1)             # O = of-use; another caller can squeeze in
    :ok
  else
    {:error, :limit_exceeded}
  end
end
```

→ test 4+ concurrent callers; under POS, found at iteration ~1.

### Lost update on read-modify-write

```elixir
v = Counter.value(pid)
Counter.set(pid, v + 1)
```

→ test multiple concurrent processes; under POS/PCT, found at
iteration 1-3.

### Message-ordering race

A NeighborReply arrives at the same mailbox as a connection_lost
signal. Whichever is processed first determines outcome.

→ Lockstep.send + Process.monitor + handle_info patterns.

### Linearizability violations

Use `Lockstep.Checker.Linearizable`:

```elixir
ctest "registry is linearizable" do
  history = run_workload(...)
  assert :ok = Lockstep.Checker.Linearizable.check(history, model)
end
```

## Reading traces

Trace output uses pid aliases for readability:

```
P0(root) = #PID<0.123.0>
P1 = #PID<0.124.0>
P2 = #PID<0.125.0>

Schedule:
  step 1  hello   P0(root)
  step 2  spawn   P0(root) -> P1
  step 3  send    P0(root) -> P1  {:hello}
  step 4  recv    P1            {:hello}
  step 5  exit    P1 reason={:exception, :error, ...}  <-- FAILED HERE
```

The `<-- FAILED HERE` marker shows where the assertion or
invariant fired. Read the trace bottom-up to understand causality.

## Causal slice

Lockstep automatically slices traces to show only events causally
related to the failure. Set `LOCKSTEP_NO_CAUSAL_SLICE=1` to disable.
For long traces, the slice is ~5-20% of the original.

## LLM-explained counterexamples

If you have an Anthropic API key:

```sh
export ANTHROPIC_API_KEY=sk-ant-...
mix test  # Failures will be explained in plain English
```

Set `LOCKSTEP_LLM_OFF=1` to disable.

## Documentation

- **Overview + tutorials**: `README.md`
- **Methodology**: `METHODOLOGY.md` — playbook for testing real
  Hex packages
- **Bug-finding case studies**: `docs/design/BUG_FINDINGS.md`
- **Design history + strategy**: `docs/design/`

## Programmatic API

For non-ExUnit callers:

```elixir
Lockstep.Runner.run(
  fn -> my_test_body() end,
  iterations: 1000,
  strategy: :pos,
  max_steps: 5_000,
  seed: 1,
  iter_timeout: 30_000,
  suite: "my_suite"
)
```

Returns `:ok` or raises `Lockstep.BugFound` with iteration, seed,
strategy, and trace path.

## When to escalate

If a bug is reproducible with `seed: N` but doesn't reproduce with
other seeds, **the seed matters** — file the bug with that exact
seed. Maintainers can verify with the same seed and fix-cycle is
deterministic.

If a test ALWAYS produces a "bug" at iteration 1 but you don't
believe it's a real bug, the test scenario is likely too aggressive
(forcing the race deterministically rather than testing whether the
race exists). Restructure the test to be more "natural" usage.