CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and ExAthena adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.4.4 — Adapter-boundary message logging (Claude Code-style)

### Added

- `ExAthena.Providers.ReqLLM` now emits structured Logger messages at
  the adapter boundary, mirroring the Claude Code SDK's `[SDK]` log
  style for parity with consumers that already grep for that prefix:
  - `[ExAthena.ReqLLM] →query|→stream model=… msgs=N tools=K base_url=… backend=…`
    (info level, one line per request).
  - `[ExAthena.ReqLLM] →query system_prompt=…\n  msg[i] role: <preview>`
    (debug level, full message body previews with whitespace collapsed
    and capped at 200B/message).
  - `[ExAthena.ReqLLM] ←text_delta NB: <preview>` per streamed chunk
    (debug level).
  - `[ExAthena.ReqLLM] ←tool_call name=… args=…` per tool call
    (debug level).
  - `[ExAthena.ReqLLM] ←meta finish_reason=…` and `←usage …`
    (debug level).
  - `[ExAthena.ReqLLM] ←done finish_reason=… text_chars=N tool_calls=K usage=…`
    on completion (info level).
  - `[ExAthena.ReqLLM] ←error …` when req_llm returns an error
    (warning level).
- All debug-level lines wrap their message construction in
  `Logger.debug(fn -> … end)` so log assembly is skipped at higher
  log levels — zero overhead in production.

### Why

Until 0.4.4 the adapter forwarded request/response data to req_llm
without leaving any breadcrumb in the host application's log. When a
streaming run stalled or the LLM returned unexpected content, callers
had no way to see what was sent or what came back without attaching a
telemetry handler. The new lines give the same kind of visibility
ClaudeCode's SDK provides, so debugging Ollama/OpenAI/llama.cpp flows
is a `tail -f phoenix_output.log | grep '\[ExAthena.ReqLLM\]'` away.

## v0.4.3 — Convert tools to %ReqLLM.Tool{} structs at the adapter boundary

### Fixed

- `ExAthena.Providers.ReqLLM.build_opts/2` now converts the modes' tool
  list (OpenAI-format maps from `Tools.describe_for_provider/1`) into
  `%ReqLLM.Tool{}` structs before forwarding to req_llm. Without this,
  req_llm 1.10's openai adapter raised
  `no function clause matching in ReqLLM.Tool.to_schema/2` while building
  the streaming request — `to_schema/2` only matches `%ReqLLM.Tool{}`.
  The conversion sets a stub callback because ex_athena executes tools
  server-side via the loop, not via req_llm's client-side dispatch.
  Already-formed `%ReqLLM.Tool{}` structs and string-keyed maps are
  passed through unchanged for forward compatibility.

## v0.4.2 — Ollama tagged model spec follow-up

### Fixed

- `ExAthena.Providers.ReqLLM.resolve_model/2` no longer skips the
  provider-tag prefix when the model id contains a colon. Ollama model
  ids legitimately use `:` as the version separator
  (`qwen2.5-coder:14b`, `qwen3-coder:30b`), so the previous "model
  contains `:` ⇒ already tagged" heuristic shipped bare names like
  `"qwen2.5-coder:14b"` to req_llm, which then split on the first
  colon and tried to validate `"qwen2.5-coder"` as a provider name —
  failing the `^[a-z0-9_-]+$` regex (rejects `.`) with
  `{:error, :bad_provider}`. Now always prepends the tag unless the
  model already starts with `"<tag>:"` (caller passed a fully-formed
  spec).

## v0.4.1 — Ollama via OpenAI-compatible adapter

### Fixed

- `provider: :ollama` now talks to local Ollama through `req_llm`'s OpenAI
  adapter instead of looking up an `:ollama` provider in `llm_db`'s
  catalog. `llm_db` 2026.4.x removed first-class local-Ollama support
  (it only catalogues `:ollama_cloud` now), so `"ollama:<model>"` model
  specs were rejected with `{:error, :unknown_provider}` from
  `LLMDB.Spec`. The fix routes `:ollama` (and `:llamacpp`) through the
  `"openai:<model>"` tag and threads
  `openai_compatible_backend: :ollama` so `req_llm` 1.10's openai
  adapter tolerates the missing API key on unauthenticated local
  deployments. Mirrors the recipe in `req_llm/guides/ollama.md`.
- `base_url` for Ollama now auto-appends `/v1` when callers pass the
  bare host (`http://localhost:11434`) — req_llm's openai adapter
  expects the prefix to already include `/v1`.
- A placeholder `api_key` (`"ollama"`) is substituted when the
  `:ollama` backend marker is set and no key was supplied — Ollama
  ignores the Authorization header but `req_llm`'s HTTP layer still
  emits one, so a non-nil value is required.

### Internal

- `ExAthena.Config.@req_llm_provider_tag[:ollama]` now resolves to
  `"openai"` (was `"ollama"`).
- New `@local_openai_compatible_backends` map drives
  `openai_compatible_backend` injection in `Config.pop_provider!/1`.
- `ExAthena.Providers.ReqLLM.build_opts/2` reads the backend marker,
  normalises base_url, and falls back to the placeholder api_key.

## v0.4.0 — operational harness (memory, skills, hooks, modes, agents, storage)

The "1.6% reasoning, 98.4% harness" upgrade. Where v0.3 perfected the
loop kernel, v0.4 builds the *operational harness* the [Claude Code paper](https://arxiv.org/abs/2604.14228)
calls out as the bulk of a production agent's value: file-based memory
+ skills, a five-stage compaction pipeline with reactive recovery, a
14-event hook surface with `{:inject, msg}` / `{:transform, prompt}`
returns, two new permission modes, structured tool results, custom
agent definitions with optional git-worktree isolation, and append-only
session storage with file-checkpointing + `/rewind`.

Landed as seven tightly-scoped commits (PR0 → PR5) — each one keeps the
existing test suite passing and adds focused new tests on top.

### PR0 — Foundation

#### Added

- `:session_id` and `:parent_session_id` plumbed through `Loop.State`,
  `Loop.run/2` opts, the resulting `ToolContext`, and the `SessionStart`
  hook payload. The `Session` GenServer auto-generates a stable id at
  start_link, reuses it on every turn, and refuses to let per-call
  `extra_opts` redirect mid-conversation. PR4 + PR5 read these.
- `:error_prompt_too_long` finish-reason in `Loop.Terminations` with
  category `:capacity`. Modes signal context-window-exceeded uniformly;
  PR2's reactive compaction switches on this.
- Doctest in `Permissions.check/4` documenting and locking the
  deny-first ordering (`:disallowed_tools` survives `:bypass_permissions`,
  `:allowed_tools` survives a permissive callback).

### PR1 — Memory + Skills (file-based context)

#### Added — `ExAthena.Memory`

- Loads `AGENTS.md` (preferred) / `CLAUDE.md` from a 3-level hierarchy:
  user (`~/.config/ex_athena/`) → project (`<cwd>/`) → local override
  (`<cwd>/AGENTS.local.md`).
- Each file becomes a single user-role message tagged `name: "memory"`
  placed at the front of the conversation. The Claude Code paper notes
  Claude Code uses user-context (not system) for probabilistic
  compliance — we copy the pattern.
- `AGENTS.md` wins over `CLAUDE.md` at the same level (matches opencode).

#### Added — `ExAthena.Skills`

- Claude Code-style progressive disclosure. `SKILL.md` files have YAML
  frontmatter (`name`, `description`, `disable-model-invocation`,
  `allowed-tools`) plus a markdown body. The frontmatter is auto-injected
  into the system prompt as a `## Available Skills` catalog (~50 tokens
  per skill); bodies stay on disk until needed.
- Two activation paths: a `[skill: name]` sentinel the model writes in
  its response, or the new `:preload_skills` opt for hosts that know
  up-front what's needed.
- Loaded from `~/.config/ex_athena/skills/<name>/SKILL.md` and
  `<cwd>/.exathena/skills/<name>/SKILL.md`. Project overrides user.

#### Added — `Loop.run/2` options

- `:memory` — `:auto` (default), `false`, or explicit message list.
- `:skills` — `:auto` (default), `false`, or explicit map.
- `:preload_skills` — list of skill names to activate up-front.

#### Changed

- `Compactors.Summary` extends its effective pinned-prefix by
  `meta[:memory_count] + meta[:preloaded_skill_count]` so memory + pre-loaded
  skills survive every compaction cycle.

### PR2 — Five-layer compaction pipeline + reactive recovery

#### Added — pipeline architecture

- `ExAthena.Compactor.Pipeline` — the new default compactor. Walks a
  configurable list of `Compactor.Stage` modules cheapest-first, short-
  circuiting once the conversation falls below target. Each stage runs
  inside its own `[:ex_athena, :compaction, <:stage_name>, :start | :stop]`
  telemetry span.
- `ExAthena.Compactor.Stage` behaviour with `compact_stage/2` + `name/0`
  callbacks. Existing `Compactors.Summary` keeps its legacy
  `Compactor.compact/2` callback AND now implements `Stage` via a thin
  adapter — fully backward-compatible for direct callers.

#### Added — five built-in stages

1. **`Compactors.BudgetReduction`** — replaces oversized tool-result
   bodies (>16k chars by default) with a `[truncated; ref=<id>]` pointer.
   Full payload moves to `state.meta[:tool_result_archive]`. Pure-Elixir.
2. **`Compactors.Snip`** — drops stale tool-result bodies older than
   `:snip_age_iterations` whose paired assistant turn already happened,
   replacing each with a `<snipped: stale tool-result for call …>` marker.
3. **`Compactors.Microcompact`** — collapses runs of 3+ adjacent
   tool-result messages into a single elided summary tagged
   `name: "microcompact"`. Pure-Elixir.
4. **`Compactors.ContextCollapse`** — non-destructive view-time
   projection. Detects superseded reads (file later edited) and
   consecutive duplicate tool calls; writes the projection to
   `state.meta[:compact_view]` for the next request to consume. The
   authoritative `state.messages` is never mutated, so resume / replay
   / rewind (PR5) stay correct.
5. **`Compactors.Summary`** — existing LLM summary stage, refactored.

#### Added — reactive recovery

- When a mode returns `{:error, :error_prompt_too_long}` (PR0
  finish-reason), the loop runs the pipeline with `force: true`
  unconditionally and retries the same iteration once. Gated by
  `:reactive_compact` opt (default `true`).

#### Configuration

- `:compaction_pipeline` — host-overridable stage list. Default is
  `[BudgetReduction, Snip, Microcompact, ContextCollapse, Summary]`.

### PR3a — Hooks expansion + permission modes

#### Added — hook events (14 total)

- `Hooks.events/0` exposes the catalog: `SessionStart`, `SessionEnd`,
  `UserPromptSubmit`, `ChatParams`, `Stop`, `StopFailure`, `PreToolUse`,
  `PostToolUse`, `PostToolUseFailure`, `PermissionRequest`,
  `PermissionDenied`, `SubagentStart`, `SubagentStop`, `PreCompact`,
  `PreCompactStage`, `PostCompact`, `Notification`.
- New return values for hook callbacks:
  - `{:inject, message_or_messages}` — append context to the conversation.
    opencode's `experimental.chat.system.transform` pattern.
  - `{:transform, prompt}` — only meaningful from `UserPromptSubmit`;
    rewrites the user prompt before it enters the loop.
- `run_lifecycle_with_outputs/3` returns `%{halt:, injects:, transform:}`
  for callers that need the richer outputs. `run_lifecycle/3` keeps its
  `:ok | {:halt, _}` shape.

#### Newly fired events

- `Stop` / `StopFailure` / `SessionEnd` from `to_result/2`.
- `UserPromptSubmit` from `build_initial_state/2`.
- `ChatParams` from `Modes.ReAct.iterate/1`, just before each provider call.
- `PostToolUseFailure` when a tool returns `{:error, _}`.
- `PermissionDenied` whenever the gate decides `{:deny, _}`.
- `SubagentStart` / `SubagentStop` from `Tools.SpawnAgent`.
- `PreCompactStage` / `PostCompact` from the compaction pipeline.

#### Added — permission modes

- `:accept_edits` — auto-allow Read/Glob/Grep/WebFetch + Edit/Write/TodoWrite
  + plan_mode/spawn_agent. Bash + custom tools still consult `can_use_tool`.
- `:trusted` — skip the `can_use_tool` callback for every tool. Still
  respects the denylist by default; pass `respect_denylist: false` to
  disable that. The `:auto` name is reserved for the future ML safety
  classifier.

`:bypass_permissions` continues to respect the denylist (deny-first
invariant from PR0's doctest is preserved).

### PR3b — Tool-result split (LLM content + UI payload) ⚠️ Breaking

Tools may now return a 3-tuple `{:ok, llm, ui}` in addition to the
existing `{:ok, text}`. The `llm` is the LLM-facing string the model
sees on the next iteration; `ui` is a `%{kind:, payload:}` map hosts
(TUIs, Phoenix LiveView frontends) can render as rich content
(diffs, file previews, process output, match lists) without parsing
the text. This is the Pi-style split adapted to Elixir's pattern-match
idiom.

#### Added

- `Messages.ToolResult` grows `ui_payload :: %{kind:, payload:} | nil`.
- `Loop.Events` adds `{:tool_ui, %{tool_call_id:, kind:, payload:}}`.
- New event emitted after `:tool_result` for any tool result carrying a payload.

#### Built-in payload shapes

- `Read` → `:file` { path, content, line_range }
- `Edit` → `:diff` { path, before, after, replacements }
- `Bash` → `:process` { command, exit_code, stdout, duration_ms }
- `Glob` → `:matches` { pattern, count, items }
- `Grep` → `:matches` { pattern, count, items }
- `WebFetch` → `:webpage` { url, status, truncated? }
- `Write`, `TodoWrite`, `PlanMode` — text-only, unchanged.
- `SpawnAgent` (PR4) → `:subagent` { iterations, cost_usd, isolation, … }

#### Breaking change — direct tool callers

The 6 builtins listed above (`Read`, `Edit`, `Bash`, `Glob`, `Grep`,
`WebFetch`) now return `{:ok, text, ui}` 3-tuples instead of the
`{:ok, text}` 2-tuple. Callers using these tools through the loop are
unaffected — `Result.text` still surfaces the LLM-facing string. Code
that calls these tools' `execute/2` directly needs to update its
pattern matches. The `{:ok, text}` 2-tuple remains a fully supported
return shape for custom and third-party tools.

### PR4 — Subagents v2 (Agents.md + worktrees + sidechains)

#### Added — `ExAthena.Agents`

- File-based agent definitions in markdown + YAML frontmatter, loaded
  from a 3-level hierarchy (builtin → user → project). Frontmatter
  fields: `name`, `description`, `model`, `provider`, `tools`,
  `permissions`, `mode`, `isolation`. Body becomes a system-prompt
  addendum.
- Builtin definitions shipped in `priv/agents/`:
  - `general` — full-tool default (matches the prior SpawnAgent behaviour).
  - `explore` — read-only fast investigation.
  - `plan` — analysis only with writes restricted to `.exathena/plans/`.
- `Agents.apply_to_opts/2` merges definition fields into spawn opts.

#### Added — worktree isolation

- `ExAthena.Agents.Worktree.resolve/3` runs three safety checks before
  creating a git worktree (git on PATH, cwd inside repo, clean tree).
  If any check fails, the subagent transparently falls back to
  `:in_process`.
- Worktrees live under `~/.cache/ex_athena/worktrees/<sess>/<name>-<n>`,
  branched from `HEAD`. After the subagent finishes:
  - Changes left → worktree is kept; path + branch surface in the spawn
    result's `ui_payload` for review/merge.
  - Clean → `git worktree remove --force` cleans up.
- `ExAthena.Agents.WorktreeSweeper` is a one-shot at boot under the
  application supervisor that runs `git worktree prune` and removes
  cache entries older than 7 days.
- All internal git invocations bypass the parent's permission gate via
  `System.cmd/3` directly — otherwise a parent in `:plan` mode could
  never spawn a worktree-isolated subagent.

#### Added — sidechain transcripts

- `ExAthena.Agents.Sidechain.write/1` persists each subagent's full
  transcript to `<cwd>/.exathena/sessions/<parent_session_id>/sidechains/<subagent_id>.jsonl`.
  Parent only sees the subagent's final `text`; the full conversation
  lives here.

#### `SpawnAgent` updates

- New `agent: "<name>"` arg resolves a named definition and applies its
  fields to the sub-loop opts.
- `SubagentStart` payload now includes `:agent` and `:isolation`.
- `SubagentStop` payload includes the finalized isolation state.
- Spawn returns the `{:ok, llm, ui}` 3-tuple from PR3b with a
  `:subagent` UI payload carrying iterations / tool_calls_made /
  cost_usd / duration_ms / isolation.

### PR5 — Append-only session storage + checkpointing + rewind

#### Added — `ExAthena.Sessions.Store`

- Behaviour for append-only event storage with `append/2`, `read/1`,
  `list/0`, `tail/2`. Each event carries an ISO 8601 timestamp + uuid.
- `Sessions.Stores.InMemory` — ETS-backed default. The application
  supervisor keeps a single named GenServer alive so the table is shared
  across the BEAM.
- `Sessions.Stores.Jsonl` — ETS-buffered, periodic flush (default 250ms).
  Hot-path appends never block on I/O. Files at
  `<root>/<session_id>.jsonl`. Synchronous `flush/1` for tests + clean
  shutdown.

#### Session integration

- `Session.start_link/1` accepts `:store` opt: `:in_memory` (default),
  `:jsonl`, or a custom module.
- On every `send_message/2`: emits `:user_message`, then walks
  `result.messages` after the loop and emits `:assistant_message` /
  `:tool_result` for new entries.
- `Session.resume/2` reads events back, filters to user/assistant
  messages, and returns the reconstructed message list. Permissions
  deliberately don't survive resume (Claude Code's pattern: trust is
  re-established per session).

#### Added — `ExAthena.Checkpoint`

- File-history backups before each `Tools.Edit` / `Tools.Write` at
  `<cwd>/.exathena/file-history/<session_id>/<sha>/<version>.bin`.
  SHA-256 of the absolute path; versions are 0-indexed and idempotent.
  Tombstones (`<v>.tombstone`) mark "this file didn't exist at
  checkpoint time".
- `Checkpoint.rewind/3` modes:
  - `:code_and_history` — restore each file to its version-0 snapshot
    AND truncate the JSONL session log to the chosen `to_uuid`.
  - `:history_only` — only truncate the JSONL.
- `ExAthena.Checkpoint.Sweeper` — startup task that GCs file-history
  directories older than 30 days.

### Distribution

- `mix.exs` `:files` now includes `priv/` so the builtin agent
  definitions ship with the package on Hex.

### Tests

- 248 tests + 2 doctests, 0 failures (was 147 + 0 in v0.3.1).
- Backward-compatible by design: existing v0.3 tests untouched, except
  for the 6 builtin tools whose return shape changed (PR3b — tightened
  to `{:ok, text, ui}`).

## v0.3.1 — per-token streaming in the ReAct mode

### Added

- `Modes.ReAct` now dispatches to `provider_mod.stream/3` (instead of
  `query/2`) whenever the caller registered an `on_event` callback on
  `Loop.run/2`. Every `%Streaming.Event{type: :text_delta, data: ...}`
  produced by the provider is forwarded to `on_event` in real time, so
  consumers (e.g. a LiveView chat UI) get character-level deltas again
  without having to drive streaming themselves.
- When no `on_event` is set the behaviour is unchanged — the mode uses
  the cheaper one-shot `query/2` path.
- When the provider module does not implement `stream/3` (it is an
  optional callback) the mode transparently falls back to `query/2`.

### Changed

- Docstring on `Modes.ReAct` now reflects the stream/query dispatch.

## v0.3.0 — PR 4 (observability) landed; Phase 4 closed

### PR 4 — Observability

#### Added — OpenTelemetry GenAI semconv telemetry

- `ExAthena.Telemetry` — emits `:telemetry`-library events shaped to the
  OpenTelemetry GenAI semantic conventions. Consumers bridge to OTel
  via `opentelemetry_telemetry` (no direct OTel dep). Events:
  - `[:ex_athena, :loop, :start | :stop | :exception]`
  - `[:ex_athena, :chat, :start | :stop]`
  - `[:ex_athena, :tool, :start | :stop]`
  - `[:ex_athena, :compaction, :stop]`
  - `[:ex_athena, :subagent, :spawn | :stop]`
  - `[:ex_athena, :structured_retry]`
- GenAI semconv metadata keys: `gen_ai_operation_name`,
  `gen_ai_provider_name`, `gen_ai_request_model`, `gen_ai_agent_id`,
  `gen_ai_conversation_id`, `gen_ai_tool_name`, `gen_ai_tool_call_id`,
  `gen_ai_usage_input_tokens`, `gen_ai_usage_output_tokens`,
  `gen_ai_response_finish_reasons`.
- New `:conversation_id` / `:agent_id` opts on `Loop.run/2` — threaded
  into every emitted event's metadata so OTel traces can stitch across
  turns.
- `Telemetry.span/3` helper wraps arbitrary work in a start/stop pair
  with duration measurement + exception re-raising.

#### Released

- Version bump `0.3.0-dev` → `0.3.0`. Ready for Hex publish.

## v0.3.0-dev — PR 3 landed

### PR 3 — Reliability + intelligence

No additional breaking changes. New capabilities layer on top of PR 2.

#### Added — context compaction

- `ExAthena.Compactor` — behaviour for context-window reduction. Called
  by the kernel before each iteration when the token estimate crosses
  `:compact_at` (default 60% of the provider's `max_tokens`). Preserves
  a pinned prefix (system prompt + rules) and a live suffix (recent
  turns) while substituting the middle with a summary.
- `ExAthena.Compactors.Summary` — default implementation. Uses the
  session's own provider to generate a terse summary and replaces the
  dropped messages with a single assistant message tagged
  `name: "compactor_summary"`. Cost counts against the run's budget.
- New options: `:compact_at` (default 0.6), `:pinned_prefix_count`
  (default 1), `:live_suffix_count` (default 6), `:compactor` (override
  module).
- New events: `{:compaction, metadata}` fires after a successful
  compaction with before/after token counts and dropped count.
- New termination: `:error_compaction_failed` when compaction errors.
- New hook: `:PreCompact` fires with `%{estimate: …}` before each
  compaction attempt.

#### Added — budget accounting from provider metadata

- `extract_cost/1` in `ExAthena.Modes.ReAct` pulls `:total_cost` (or
  `:input_cost + :output_cost`) from provider usage metadata and folds
  it into the run's Budget. req_llm's `models.dev`-backed cost data
  flows straight through.
- `ExAthena.Result.cost_usd` is populated when the provider reports
  cost; `nil` otherwise.
- `:max_budget_usd` (introduced as a knob in PR 2) now genuinely trips
  `:error_max_budget_usd` when cumulative cost crosses the cap.

#### Added — structured-output repair loop (instructor-style)

- `ExAthena.Structured.extract/2` now retries on validation failure by
  appending the failed response + a user message carrying the validation
  error and re-prompting. Default `:max_retries: 2`.
- After retries exhaust, returns
  `{:error, {:error_max_structured_output_retries, last_validation_error}}`.
- New events: `{:structured_retry, %{attempt:, error:}}` fires on each
  retry.

#### Added — Plan-and-Solve mode

- `ExAthena.Modes.PlanAndSolve` — two-phase mode. First iteration is
  **planning-only** (no tools, plain-text plan following a structured
  prompt). Subsequent iterations delegate to `ReAct`.
- Rationale: smaller / local models produce better tool-calling
  behaviour when they articulate a plan first.

#### Added — Reflexion mode

- `ExAthena.Modes.Reflexion` — after each ReAct iteration, injects a
  short self-critique pass and adds it to the conversation history.
  Capped at 3 reflections (per research — beyond that,
  degeneration-of-thought kicks in).
- Triples per-loop cost; best reserved for correctness-sensitive tasks.

#### Added — subagent supervision upgrade

- `ExAthena.Tools.SpawnAgent` now runs sub-loops under
  `Task.Supervisor.async_nolink` (supervisor name `ExAthena.Tasks`,
  registered by `ExAthena.Application`). Sub-agent crashes no longer
  propagate to the parent; timeouts are enforceable.
- New events: `{:subagent_spawn, %{id:, prompt:}}` and
  `{:subagent_result, %{id:, text:}}` fire around sub-loop execution.
- New optional arg: `timeout_ms` (default 300_000).
- New error subtypes from SpawnAgent: `{:sub_agent_crashed, reason}`,
  `{:sub_agent_timeout, ms}`.

### Tests

- **140 total** (up from 126 in PR 2). 14 new cover compaction
  (threshold detection, middle-replacement, error surfacing), budget
  caps (cost-based termination, `cost_usd` accumulation, nil fallback),
  structured repair loop (retry success, retry exhaustion, retry
  events), Plan-and-Solve (planning turn assertion, execution-phase
  tool use), and Reflexion (reflection cap, history injection).

### PR 2 — Kernel rewrite (**breaking changes**)

**The return type of `ExAthena.Loop.run/2` is now `{:ok, %Result{}}`
instead of the v0.2 `{:ok, map()}`.** Consumers pattern-matching on the
old map shape must update.

#### Added — pluggable Mode behaviour

- `ExAthena.Loop.Mode` — behaviour with `init/1` + `iterate/1`. Drives
  the turn-by-turn control flow. Kernel handles caps, budget, hooks,
  counters, events, and Result construction.
- `ExAthena.Modes.ReAct` — default mode. ReAct cycle (reason → act →
  observe) with parallel tool execution, mistake counter, and typed
  terminations.
- `ExAthena.Modes.PlanAndSolve` + `ExAthena.Modes.Reflexion` — stubs
  returning `:not_implemented`. Full implementations land in PR 3.
- `ExAthena.Loop.Mode.resolve/1` translates atom shortcuts (`:react`,
  `:plan_and_solve`, `:reflexion`) to modules.

#### Added — reliability knobs

- `:max_consecutive_mistakes` (default 3) — trips
  `:error_consecutive_mistakes` after N consecutive tool errors. A
  successful tool call resets the counter. Prevents runaway loops
  (Cline pattern).
- `:max_budget_usd` — trips `:error_max_budget_usd` when the budget
  accumulator crosses the cap. PR 3 wires cost computation from provider
  metadata.
- `:tool_timeout_ms` (default 60_000) — per-call timeout for parallel
  execution.
- `:max_concurrency` (default 4) — `Task.async_stream` concurrency cap.

#### Added — parallel tool execution

- `ExAthena.Loop.Parallel` — classifies a single iteration's tool calls
  into parallel-safe (read-only) and serial (mutating) groups. Runs
  mutating calls first in order, then parallel-safe calls concurrently
  via `Task.async_stream/3`. Result order always matches input call
  order so the model sees aligned results.
- `ExAthena.Tool.parallel_safe?/0` — optional behaviour callback.
  Defaults to `false`.
- Read-only builtins (`Read`, `Glob`, `Grep`, `WebFetch`) declare
  `parallel_safe?: true`. Mutating builtins default to `false`.

#### Changed — event shape (**breaking change**)

v0.2's `%ExAthena.Streaming.Event{type:, data:, index:}` struct is
replaced by flat pattern-matchable tuples modelled on `ash_ai`'s
`ToolLoop.stream/2`:

    {:content, text}
    {:tool_call, ToolCall.t()}
    {:tool_result, ToolResult.t()}
    {:iteration, integer()}
    {:usage, usage_map}
    {:error, term()}
    {:done, Result.t()}

Consumers subscribing via `:on_event` need to update their handlers.
OTel span emission in PR 4 consumes the same tuples.

#### Changed — error handling

Tool errors use the `is_error: true` tool-result convention (Cline
pattern). The model sees its mistake and self-corrects; the mistake
counter advances; a streak hits the cap.

Unknown tools + parse failures flow as error tool-results rather than
halting the run. Hook-driven halts produce `:error_halted`. Provider
errors produce `:error_during_execution`.

#### Tests

126 total (up from 116 in PR 1). 10 new cover Result shape, termination
subtypes, max_iterations → `:error_max_turns`, mistake counter + reset,
parallel tool ordering, flat event tuples, Mode resolve/1.

### PR 1 — Foundation (already landed, unchanged)
PR 1 lays the foundation: canonical types, typed terminations, budget
accounting, and a single req_llm-backed provider adapter that replaces the
three hand-written provider modules.

### Added — Result, Terminations, Budget

- `ExAthena.Result` — canonical run outcome struct. Every run (success or
  error) returns a `%Result{}` carrying final text, message history,
  finish_reason, iterations, tool_calls_made, aggregated usage, cost in
  USD, duration, model, provider, and telemetry metadata. Replaces the
  loose map v0.2 returned.
- `ExAthena.Loop.Terminations` — typed finish_reason subtypes inspired by
  the Claude Agent SDK. Each run ends with exactly one of:
  `:stop`, `:error_max_turns`, `:error_max_budget_usd`,
  `:error_during_execution`, `:error_max_structured_output_retries`,
  `:error_consecutive_mistakes`, `:error_halted`, `:error_compaction_failed`.
  `Terminations.category/1` classifies each as `:success | :retryable |
  :capacity | :fatal` for retry-decision logic.
- `ExAthena.Budget` — usage + cost accumulator. Aggregates token usage
  across iterations, computes cost from provider metadata (req_llm +
  models.dev), and supports `:max_budget_usd` caps.

### Added — req_llm provider adapter

- `ExAthena.Providers.ReqLLM` — single adapter that delegates to
  `req_llm`'s 18+ providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq,
  Together, DeepInfra, Vercel, LM Studio, vLLM, llama.cpp, Mistral, Gemini,
  Cohere, Bedrock, …). Model names resolve through the `models.dev`
  registry for cost + context-window metadata.
- `ExAthena.Config.pop_provider!/1` now threads a `req_llm_provider_tag`
  key through opts so bare `model: "llama3.1"` + `provider: :ollama`
  auto-expands to the full `"ollama:llama3.1"` spec req_llm expects.
- `Config.req_llm_provider_tag/1` — translate an ExAthena provider atom
  into the req_llm `"tag:model-id"` prefix.

### Removed — hand-written provider modules

- `ExAthena.Providers.Ollama`
- `ExAthena.Providers.OpenAICompatible`
- `ExAthena.Providers.Claude`
  All three were direct HTTP clients (Ollama + OpenAICompatible) or SDK
  wrappers (Claude). req_llm does this work across more providers and
  maintains the catalogs. The provider atoms `:ollama`, `:openai`,
  `:openai_compatible`, `:llamacpp`, `:claude`, `:anthropic` continue to
  work — they now all resolve to `ExAthena.Providers.ReqLLM`.

### Added — dep

- `{:req_llm, "~> 1.10"}`.

### Breaking change — none yet (visible)

Consumer-visible API unchanged in this PR. Every existing call
(`ExAthena.query/2`, `ExAthena.stream/3`, `ExAthena.Loop.run/2`,
`ExAthena.Session.start_link/1`) works identically. The provider-module
change is internal.

Breaking API changes land in PR 2 (Kernel) alongside the new Mode
behaviour and the new stream event shape.

### Tests

- 116 tests passing (up from 91 baseline). 25 new covering Terminations,
  Result, Budget, and the req_llm adapter routing.

## v0.2.0 — unreleased

Phase 2 of the agent-loop roadmap: ex_athena is now feature-complete for
multi-turn tool-using work. Drop-in replacement for the Claude Code SDK.

### Added — Agent loop

- `ExAthena.Loop` — multi-turn loop. Infer → parse tool calls → permissions →
  PreToolUse hooks → execute → PostToolUse hooks → replay → repeat. Bounded
  by `:max_iterations` (default 25). Auto-falls-back between native and
  text-tagged tool-call protocols via `ExAthena.ToolCalls.extract/2`.
- `ExAthena.Session` — GenServer owning multi-turn conversation state.
  Appends to message history on every turn, resumable, supervised.
- `ExAthena.run/2` + `ExAthena.extract_structured/2` on the facade.

### Added — Tool behaviour + builtins

- `ExAthena.Tool` behaviour (`name`, `description`, `schema`, `execute`).
- `ExAthena.ToolContext` — `:cwd`, `:phase`, `:session_id`, `:tool_call_id`,
  `:assigns`, plus `resolve_path/2` that rejects traversal + null bytes.
- `ExAthena.Tools` registry — resolves user tool lists and constructs the
  provider-facing + prompt-facing schemas.
- Ten builtin tools:
  - `Read` (with line numbering + offset/limit)
  - `Glob` (wildcard listing with max cap)
  - `Grep` (`rg` when available, pure-Elixir fallback)
  - `Write` (creates parent dirs)
  - `Edit` (strict exact-string replacement, ambiguity-rejecting)
  - `Bash` (port-based, configurable timeout, kills on timeout)
  - `WebFetch` (http/https only, 1 MB cap)
  - `TodoWrite` (validates statuses, optional notifier callback via `assigns`)
  - `PlanMode` (phase transition request — loop consumes the sentinel)
  - `SpawnAgent` (synchronous sub-loop, inherits ctx, filters meta-tools)

### Added — Permissions

- `ExAthena.Permissions` with three modes (`:plan`, `:default`,
  `:bypass_permissions`), `allowed_tools`/`disallowed_tools` lists, and a
  `can_use_tool` callback for interactive approval.
- `:plan` mode blocks mutation tools (`write`, `edit`, `bash`, `todo_write`)
  by default; read-only tools always permitted.

### Added — Hooks

- `ExAthena.Hooks` lifecycle matching Claude Code's shape: `PreToolUse`,
  `PostToolUse`, `Stop`, `Notification`, `PreCompact`, `SessionStart`,
  `SessionEnd`. Matcher groups (regex or string) select which tools fire.
  Hook crashes are caught and become `:halt` returns.

### Added — Structured extraction

- `ExAthena.Structured.extract/2` — one-shot JSON extraction with schema
  validation. Uses JSON mode when the provider supports it; falls back to a
  fenced `~~~json` block for providers that don't. `:validator` opt for
  custom validation.

### Test surface

- 95 tests (up from 43 in Phase 1). Coverage per tool, permission modes,
  hook lifecycle, loop end-to-end driven by the Mock provider, structured
  extraction both JSON-mode and fenced.

### Phase 3 roadmap (next PR)

Start migrating `udin_code` off direct `claude_code` calls. Route ticket
work (`SdkRunner`, `GenericRunner`, `Orchestrator`) through `ExAthena.Session`
so picking `:ollama` in the `ModelProvider` UI begins actually running tasks
on Ollama.

## v0.1.0 — unreleased

Initial public release. Phase 1 of the agent-loop roadmap: pure inference
across any provider, with the canonical message/request/response shapes and
tool-call parsing infrastructure in place for Phase 2's agent loop.

### Added — Core API

- `ExAthena.query/2` — one-shot inference.
- `ExAthena.stream/3` — streaming inference with per-event callback.
- `ExAthena.capabilities/1` — static provider-capability lookup.
- `ExAthena.Config` — tiered resolver (per-call → provider env → top-level
  env → default).
- `ExAthena.Error` — canonical error struct with `:kind` atoms
  (`:unauthorized`, `:not_found`, `:rate_limited`, `:timeout`,
  `:context_length_exceeded`, `:bad_request`, `:server_error`, `:transport`,
  `:capability`, `:unknown`).

### Added — Canonical shapes

- `ExAthena.Request` — normalised inference request consumed by every provider.
- `ExAthena.Response` — normalised response with `:text`, `:tool_calls`,
  `:finish_reason`, `:usage`, `:model`, `:provider`, `:raw`.
- `ExAthena.Messages.Message` / `.ToolCall` / `.ToolResult` — conversation
  primitives. `Messages.from_map/1` tolerates both atom and string keys for
  easy interop with provider JSON.
- `ExAthena.Streaming.Event` — canonical streaming events
  (`:start`, `:text_delta`, `:tool_call_start`, `:tool_call_delta`,
  `:tool_call_end`, `:usage`, `:stop`, `:error`).

### Added — Provider contract

- `ExAthena.Provider` behaviour with `query/2`, `stream/3` (optional),
  `capabilities/0`.
- `ExAthena.Capabilities` type declaring features a provider supports.

### Added — Providers

- `ExAthena.Providers.Ollama` — local Ollama via `/api/chat` (native tool-calls
  on supported models, SSE-style newline-delimited streaming).
- `ExAthena.Providers.OpenAICompatible` — `/v1/chat/completions` for OpenAI,
  OpenRouter, LM Studio, vLLM, llama.cpp server, Together, Groq, etc. SSE
  streaming.
- `ExAthena.Providers.Claude` — wraps the `claude_code` SDK. `claude_code`
  is declared optional so consumers that don't use Claude aren't forced to
  install it. (Streaming via this provider lands in Phase 2 with sessions.)
- `ExAthena.Providers.Mock` — in-memory test double with scripted responses
  and event lists.

### Added — Tool-call parsing

- `ExAthena.ToolCalls.Native` — parses OpenAI-style `tool_calls` and Claude
  `tool_use` blocks. Tolerant of atom/string keys and JSON-string arguments.
- `ExAthena.ToolCalls.TextTagged` — parses `~~~tool_call` fenced blocks out
  of assistant prose for models without native tool-call support.
- `ExAthena.ToolCalls.extract/2` — dispatch-and-fallback between the two
  protocols based on provider capabilities.
- `ExAthena.ToolCalls.augment_system_prompt/2` — appends text-tagged
  instructions to a system prompt for non-native-capable providers.

### Added — Igniter installer

- `mix ex_athena.install` — writes sensible `config :ex_athena` defaults,
  idempotent. Picks Ollama as the default provider. Requires the `igniter`
  dep (declared optional).

### Phase 2 roadmap

Still to land: `ExAthena.Tool` behaviour + builtins (Read, Glob, Grep, Write,
Edit, Bash, WebFetch, TodoWrite, PlanMode, SpawnAgent), `ExAthena.Loop`
(multi-turn agent loop), `ExAthena.Session` GenServer, `ExAthena.Hooks`
(PreToolUse/PostToolUse/Stop lifecycle), `ExAthena.Permissions`
(`:plan` / `:default` / `:bypass` + `can_use_tool` callback), and
`ExAthena.extract_structured/2` (JSON-schema-validated output).

### Phase 3+ roadmap

Migrate `udin_code` off the `claude_code` direct dep: route every call through
`ExAthena.*`, delete `UdinCode.Claude.GenericRunner`, make picking `:ollama`
in the ModelProvider UI actually run the whole task lifecycle on Ollama.