guides/compaction_pipeline.md

Select File:
guides/compaction_pipeline.md

# Compaction pipeline (v0.4)

When the conversation's estimated tokens crosses `compact_at` (default
60% of the provider's `max_tokens`), the loop runs the compaction
pipeline before the next iteration.

The default pipeline mirrors Claude Code's five-layer design from the
[paper](https://arxiv.org/abs/2604.14228): cheap deterministic stages
run first, the expensive LLM-summary stage runs only when those
couldn't get the conversation under target.

```
state.messages
  ↓
[BudgetReduction] → drop oversized tool-result bodies
  ↓
[Snip]            → drop stale tool-result bodies (already integrated)
  ↓
[Microcompact]    → collapse runs of 3+ adjacent same-tool results
  ↓
[ContextCollapse] → non-destructive view-time projection
  ↓
[Summary]         → LLM summary of the middle of history
  ↓
state.messages (or state.meta[:compact_view])
```

Each stage is a module implementing `ExAthena.Compactor.Stage` with
`compact_stage/2` and `name/0`. The pipeline orchestrator
(`ExAthena.Compactor.Pipeline`) walks the list with `Enum.reduce_while`
and short-circuits once estimated tokens fall below target. Every
stage runs inside its own
`[:ex_athena, :compaction, <:stage_name>, :start | :stop]` telemetry
span.

## Built-in stages

### `Compactors.BudgetReduction`

Cheap, deterministic. For each tool-result message whose content
exceeds `:per_tool_result_max_chars` (default 16k), replace the body
with `[truncated; full=N chars; ref=<id>]` and persist the original
to `state.meta[:tool_result_archive]` keyed by `ref`.

This single stage often gets the conversation under target on its own
when one outlier tool produced a giant response (a `Read` of a 100KB
file, a `Bash` `grep` over a huge tree).

### `Compactors.Snip`

Cheap, deterministic. Drops stale tool-result bodies older than
`:snip_age_iterations` turns (default 4) whose paired assistant turn
already happened. Each is replaced with a `<snipped: stale tool-result
for call <id>>` marker — pairing-by-id stays intact.

Memory + skill messages and the live suffix are never touched.

### `Compactors.Microcompact`

Medium cost, deterministic. Runs of 3+ adjacent tool-result messages
collapse into a single elided summary tagged `name: "microcompact"`.
The summary lists each call's id + first 200 chars of its result.

### `Compactors.ContextCollapse`

Medium cost, *non-destructive*. Builds a projected message list for
the *next* request only — the authoritative `state.messages` is never
mutated. The projection collapses two patterns:

- **Superseded reads**: a `Read` whose file was later edited collapses
  to a `<read superseded by later edit: <path>>` stub.
- **Repeated identical tool calls**: same tool name + same args
  consecutive (skip-tool-results-aware) get a `(repeat)` marker on the
  follow-up call.

Stored at `state.meta[:compact_view]`. Resume / replay / rewind read
the original `state.messages`, so they stay correct.

### `Compactors.Summary`

Expensive — runs an LLM call. The middle of the message list is
summarised into a single assistant message tagged `name:
"compactor_summary"`. Cost counts against the run's budget. Only fires
when the deterministic stages couldn't get the conversation under
target.

## Reactive recovery

When a mode returns `{:error, :error_prompt_too_long}` (e.g. the
provider explicitly said "context too long"), the loop runs the
pipeline with `force: true` — every stage attempts compaction
unconditionally, regardless of estimated tokens. The same iteration is
then retried once. If still over (or compaction itself errors), the
run terminates with a typed `:error_prompt_too_long` capacity
termination (PR0 finish-reason).

Gated by `:reactive_compact` (default `true`):

```elixir
ExAthena.run("explore the repo", reactive_compact: false)  # opt out
```

## Configuration

### Per-call

```elixir
ExAthena.run("…",
  compact_at: 0.5,                  # trigger at 50% instead of 60%
  per_tool_result_max_chars: 8_000,
  snip_age_iterations: 8,
  microcompact_run_threshold: 5,
  compaction_pipeline: [
    ExAthena.Compactors.BudgetReduction,
    ExAthena.Compactors.Summary    # skip the others
  ])
```

### Global

```elixir
config :ex_athena, :compactor,
  compact_at: 0.6,
  pinned_prefix_count: 1,
  live_suffix_count: 6,
  per_tool_result_max_chars: 16_000,
  snip_age_iterations: 4
```

### Custom stages

Implement `ExAthena.Compactor.Stage`:

```elixir
defmodule MyApp.Compactors.DropImages do
  @behaviour ExAthena.Compactor.Stage

  @impl true
  def name, do: :drop_images

  @impl true
  def compact_stage(%ExAthena.Loop.State{} = state, estimate) do
    new_messages =
      Enum.map(state.messages, fn
        %{role: :user, content: content} = msg when is_binary(content) ->
          if String.contains?(content, "<image>"),
            do: %{msg | content: "<image elided>"},
            else: msg

        msg ->
          msg
      end)

    if new_messages == state.messages do
      :skip
    else
      {:ok,
       %{state | messages: new_messages},
       %{estimate | tokens: ExAthena.Compactor.estimate_tokens(new_messages)}}
    end
  end
end

ExAthena.run("...",
  compaction_pipeline: [
    MyApp.Compactors.DropImages,
    ExAthena.Compactors.BudgetReduction,
    ExAthena.Compactors.Summary
  ])
```

Stages should be **idempotent** — the reactive-recovery path may run
the pipeline a second time with `force: true`. Returning `:skip` on a
second pass is the canonical way to be idempotent.

## Hooks

- `:PreCompact` — fires before the pipeline runs at all.
- `:PreCompactStage` — fires before each individual stage. Payload:
  `%{stage: atom(), estimate: %{tokens, max_tokens}}`.
- `:PostCompact` — fires after a successful compaction. Payload:
  `%{metadata: %{before, after, dropped_count, stages_applied, reason}}`.

## Pinning rules

The compactor never drops:

- Messages in the **pinned prefix** — `:pinned_prefix_count` slots
  (default 1) at the front, *plus* any memory + pre-loaded skill
  messages (PR1).
- Messages in the **live suffix** — `:live_suffix_count` slots
  (default 6) at the tail.

Memory + skill messages tagged `name: "memory"` / `name: "skill:<name>"`
are pinned by default. Hosts can pin custom messages by extending
`pinned_prefix_count`.

## See also

- [`ExAthena.Compactor.Pipeline`](https://hexdocs.pm/ex_athena/ExAthena.Compactor.Pipeline.html)
- [`ExAthena.Compactor.Stage`](https://hexdocs.pm/ex_athena/ExAthena.Compactor.Stage.html)
- [Hooks reference](hooks_reference.md) — `PreCompact*` / `PostCompact`
  payload shapes.
- [Memory + skills](memory_and_skills.md) — what gets pinned.