docs/guides/subagent-compaction.md

Select File
docs/guides/subagent-compaction.md

# Context Compaction

Pressure-triggered context compaction trims older turns from a multi-turn agent's
LLM-input message list once turn count or estimated token usage crosses a threshold.
Recent turns stay intact so the agent always sees the raw breadcrumb trail of what
it most recently did.

Compaction is **opt-in**. The library default is `compaction: false`.

## When to enable it

You probably don't need compaction unless you're seeing one of:

- A multi-turn agent that runs long enough to push toward the model's context window.
- LLM cost dominated by repeatedly re-sending earlier turn history.
- Agents that stall because old context is drowning out the recent error trail.

For short multi-turn runs (under ~8 turns or under a few thousand tokens), the cost
of compaction outweighs the benefit. Leave it off.

## Quick start

```elixir
SubAgent.run(prompt, llm: llm, max_turns: 20, compaction: true)
```

`compaction: true` selects sensible defaults — the `:trim` strategy with
`trigger: [turns: 8]`, `keep_recent_turns: 3`, `keep_initial_user: true`.

## Explicit configuration

```elixir
SubAgent.run(prompt,
  llm: llm,
  max_turns: 20,
  compaction: [
    strategy: :trim,
    trigger: [turns: 8, tokens: 12_000],
    keep_recent_turns: 3,
    keep_initial_user: true,
    token_counter: nil
  ]
)
```

| Option | Default | Meaning |
|--------|---------|---------|
| `strategy` | `:trim` | The only Phase 1 strategy. Custom modules and `:summarize` are deferred. |
| `trigger` | `[turns: 8]` | Fires when `state.turn > N` (turns) or estimated total tokens `>= N` (tokens). Set both for OR semantics. |
| `keep_recent_turns` | `3` | The most recent `N × 2` messages stay verbatim. |
| `keep_initial_user` | `true` | Keep the first user message (the original prompt) at the head of the trimmed list. |
| `token_counter` | `nil` (uses default) | 1-arity function from message content to estimated token count. |

## What `:trim` does

When pressure is detected, `:trim`:

1. Optionally keeps the first user message (when `keep_initial_user: true` and the
   head of the list actually has role `:user`).
2. Keeps the last `keep_recent_turns × 2` messages.
3. Drops everything in between.

If slicing would produce an `:assistant`-leading recent slice (e.g. odd boundaries),
`:trim` drops one more message from the front so the slice begins with `:user`.

## What you'll see in `step.usage.compaction`

Triggered:

```elixir
%{
  enabled: true,
  triggered: true,
  strategy: "trim",
  reason: :turn_pressure,        # | :token_pressure
  messages_before: 13,
  messages_after: 7,
  estimated_tokens_before: 31_200,
  estimated_tokens_after: 12_400,
  kept_initial_user?: true,
  kept_recent_turns: 3,
  over_budget?: false
}
```

Not triggered (compaction was active but didn't fire):

```elixir
%{
  enabled: true,
  triggered: false,
  strategy: "trim",
  reason: nil,
  messages_before: 4,
  messages_after: 4,
  estimated_tokens_before: 320,
  estimated_tokens_after: 320,
  kept_initial_user?: false,
  kept_recent_turns: 3,
  over_budget?: false
}
```

The shape is the same in both cases — every field is always present, so consumers
can read `usage.compaction.messages_before` without `Map.has_key?` guards.

`over_budget?: true` flags the case where a single retained message exceeds the
configured `trigger[:tokens]` budget. `:trim` does not split content; you'll need
to handle this at a higher level (smaller messages, larger budget, or Phase 2's
summarization once it ships).

## Token estimation

The default counter approximates tokens as `max(1, String.length(content) / 4)` —
a pressure heuristic, not a model-accurate count. If you need adapter-specific
counting, supply your own:

```elixir
compaction: [
  token_counter: fn content -> :tiktoken.encode(content) |> length() end
]
```

The counter must be a 1-arity function returning a non-negative integer.

## Limits

- **`:trim` only.** No summarization in Phase 1.
- **No custom strategy modules.** Phase 2 will add a behaviour for that.
- **Not applied to text mode.** `output: :text` rejects compaction at validation time.
- **Not applied to single-shot or single-shot+retry.** Gate is `agent.max_turns > 1`.
- **Token counter is a heuristic.** Adapter-aware counting is deferred to Phase 2.

For the deferred items, see the Phase 2 design stub at
`docs/plans/pressure-triggered-context-compaction-phase-2.md` in the repo.

## See Also

- [Observability](subagent-observability.md) — finding compaction stats in traces
- [Troubleshooting](subagent-troubleshooting.md) — when context is the bottleneck
- `PtcRunner.SubAgent.Compaction` — module reference