# Changelog
All notable changes to this project will be documented in this file.
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and ExAthena adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## v0.3.1 — per-token streaming in the ReAct mode
### Added
- `Modes.ReAct` now dispatches to `provider_mod.stream/3` (instead of
`query/2`) whenever the caller registered an `on_event` callback on
`Loop.run/2`. Every `%Streaming.Event{type: :text_delta, data: ...}`
produced by the provider is forwarded to `on_event` in real time, so
consumers (e.g. a LiveView chat UI) get character-level deltas again
without having to drive streaming themselves.
- When no `on_event` is set the behaviour is unchanged — the mode uses
the cheaper one-shot `query/2` path.
- When the provider module does not implement `stream/3` (it is an
optional callback) the mode transparently falls back to `query/2`.
### Changed
- Docstring on `Modes.ReAct` now reflects the stream/query dispatch.
## v0.3.0 — PR 4 (observability) landed; Phase 4 closed
### PR 4 — Observability
#### Added — OpenTelemetry GenAI semconv telemetry
- `ExAthena.Telemetry` — emits `:telemetry`-library events shaped to the
OpenTelemetry GenAI semantic conventions. Consumers bridge to OTel
via `opentelemetry_telemetry` (no direct OTel dep). Events:
- `[:ex_athena, :loop, :start | :stop | :exception]`
- `[:ex_athena, :chat, :start | :stop]`
- `[:ex_athena, :tool, :start | :stop]`
- `[:ex_athena, :compaction, :stop]`
- `[:ex_athena, :subagent, :spawn | :stop]`
- `[:ex_athena, :structured_retry]`
- GenAI semconv metadata keys: `gen_ai_operation_name`,
`gen_ai_provider_name`, `gen_ai_request_model`, `gen_ai_agent_id`,
`gen_ai_conversation_id`, `gen_ai_tool_name`, `gen_ai_tool_call_id`,
`gen_ai_usage_input_tokens`, `gen_ai_usage_output_tokens`,
`gen_ai_response_finish_reasons`.
- New `:conversation_id` / `:agent_id` opts on `Loop.run/2` — threaded
into every emitted event's metadata so OTel traces can stitch across
turns.
- `Telemetry.span/3` helper wraps arbitrary work in a start/stop pair
with duration measurement + exception re-raising.
#### Released
- Version bump `0.3.0-dev` → `0.3.0`. Ready for Hex publish.
## v0.3.0-dev — PR 3 landed
### PR 3 — Reliability + intelligence
No additional breaking changes. New capabilities layer on top of PR 2.
#### Added — context compaction
- `ExAthena.Compactor` — behaviour for context-window reduction. Called
by the kernel before each iteration when the token estimate crosses
`:compact_at` (default 60% of the provider's `max_tokens`). Preserves
a pinned prefix (system prompt + rules) and a live suffix (recent
turns) while substituting the middle with a summary.
- `ExAthena.Compactors.Summary` — default implementation. Uses the
session's own provider to generate a terse summary and replaces the
dropped messages with a single assistant message tagged
`name: "compactor_summary"`. Cost counts against the run's budget.
- New options: `:compact_at` (default 0.6), `:pinned_prefix_count`
(default 1), `:live_suffix_count` (default 6), `:compactor` (override
module).
- New events: `{:compaction, metadata}` fires after a successful
compaction with before/after token counts and dropped count.
- New termination: `:error_compaction_failed` when compaction errors.
- New hook: `:PreCompact` fires with `%{estimate: …}` before each
compaction attempt.
#### Added — budget accounting from provider metadata
- `extract_cost/1` in `ExAthena.Modes.ReAct` pulls `:total_cost` (or
`:input_cost + :output_cost`) from provider usage metadata and folds
it into the run's Budget. req_llm's `models.dev`-backed cost data
flows straight through.
- `ExAthena.Result.cost_usd` is populated when the provider reports
cost; `nil` otherwise.
- `:max_budget_usd` (introduced as a knob in PR 2) now genuinely trips
`:error_max_budget_usd` when cumulative cost crosses the cap.
#### Added — structured-output repair loop (instructor-style)
- `ExAthena.Structured.extract/2` now retries on validation failure by
appending the failed response + a user message carrying the validation
error and re-prompting. Default `:max_retries: 2`.
- After retries exhaust, returns
`{:error, {:error_max_structured_output_retries, last_validation_error}}`.
- New events: `{:structured_retry, %{attempt:, error:}}` fires on each
retry.
#### Added — Plan-and-Solve mode
- `ExAthena.Modes.PlanAndSolve` — two-phase mode. First iteration is
**planning-only** (no tools, plain-text plan following a structured
prompt). Subsequent iterations delegate to `ReAct`.
- Rationale: smaller / local models produce better tool-calling
behaviour when they articulate a plan first.
#### Added — Reflexion mode
- `ExAthena.Modes.Reflexion` — after each ReAct iteration, injects a
short self-critique pass and adds it to the conversation history.
Capped at 3 reflections (per research — beyond that,
degeneration-of-thought kicks in).
- Triples per-loop cost; best reserved for correctness-sensitive tasks.
#### Added — subagent supervision upgrade
- `ExAthena.Tools.SpawnAgent` now runs sub-loops under
`Task.Supervisor.async_nolink` (supervisor name `ExAthena.Tasks`,
registered by `ExAthena.Application`). Sub-agent crashes no longer
propagate to the parent; timeouts are enforceable.
- New events: `{:subagent_spawn, %{id:, prompt:}}` and
`{:subagent_result, %{id:, text:}}` fire around sub-loop execution.
- New optional arg: `timeout_ms` (default 300_000).
- New error subtypes from SpawnAgent: `{:sub_agent_crashed, reason}`,
`{:sub_agent_timeout, ms}`.
### Tests
- **140 total** (up from 126 in PR 2). 14 new cover compaction
(threshold detection, middle-replacement, error surfacing), budget
caps (cost-based termination, `cost_usd` accumulation, nil fallback),
structured repair loop (retry success, retry exhaustion, retry
events), Plan-and-Solve (planning turn assertion, execution-phase
tool use), and Reflexion (reflection cap, history injection).
### PR 2 — Kernel rewrite (**breaking changes**)
**The return type of `ExAthena.Loop.run/2` is now `{:ok, %Result{}}`
instead of the v0.2 `{:ok, map()}`.** Consumers pattern-matching on the
old map shape must update.
#### Added — pluggable Mode behaviour
- `ExAthena.Loop.Mode` — behaviour with `init/1` + `iterate/1`. Drives
the turn-by-turn control flow. Kernel handles caps, budget, hooks,
counters, events, and Result construction.
- `ExAthena.Modes.ReAct` — default mode. ReAct cycle (reason → act →
observe) with parallel tool execution, mistake counter, and typed
terminations.
- `ExAthena.Modes.PlanAndSolve` + `ExAthena.Modes.Reflexion` — stubs
returning `:not_implemented`. Full implementations land in PR 3.
- `ExAthena.Loop.Mode.resolve/1` translates atom shortcuts (`:react`,
`:plan_and_solve`, `:reflexion`) to modules.
#### Added — reliability knobs
- `:max_consecutive_mistakes` (default 3) — trips
`:error_consecutive_mistakes` after N consecutive tool errors. A
successful tool call resets the counter. Prevents runaway loops
(Cline pattern).
- `:max_budget_usd` — trips `:error_max_budget_usd` when the budget
accumulator crosses the cap. PR 3 wires cost computation from provider
metadata.
- `:tool_timeout_ms` (default 60_000) — per-call timeout for parallel
execution.
- `:max_concurrency` (default 4) — `Task.async_stream` concurrency cap.
#### Added — parallel tool execution
- `ExAthena.Loop.Parallel` — classifies a single iteration's tool calls
into parallel-safe (read-only) and serial (mutating) groups. Runs
mutating calls first in order, then parallel-safe calls concurrently
via `Task.async_stream/3`. Result order always matches input call
order so the model sees aligned results.
- `ExAthena.Tool.parallel_safe?/0` — optional behaviour callback.
Defaults to `false`.
- Read-only builtins (`Read`, `Glob`, `Grep`, `WebFetch`) declare
`parallel_safe?: true`. Mutating builtins default to `false`.
#### Changed — event shape (**breaking change**)
v0.2's `%ExAthena.Streaming.Event{type:, data:, index:}` struct is
replaced by flat pattern-matchable tuples modelled on `ash_ai`'s
`ToolLoop.stream/2`:
{:content, text}
{:tool_call, ToolCall.t()}
{:tool_result, ToolResult.t()}
{:iteration, integer()}
{:usage, usage_map}
{:error, term()}
{:done, Result.t()}
Consumers subscribing via `:on_event` need to update their handlers.
OTel span emission in PR 4 consumes the same tuples.
#### Changed — error handling
Tool errors use the `is_error: true` tool-result convention (Cline
pattern). The model sees its mistake and self-corrects; the mistake
counter advances; a streak hits the cap.
Unknown tools + parse failures flow as error tool-results rather than
halting the run. Hook-driven halts produce `:error_halted`. Provider
errors produce `:error_during_execution`.
#### Tests
126 total (up from 116 in PR 1). 10 new cover Result shape, termination
subtypes, max_iterations → `:error_max_turns`, mistake counter + reset,
parallel tool ordering, flat event tuples, Mode resolve/1.
### PR 1 — Foundation (already landed, unchanged)
PR 1 lays the foundation: canonical types, typed terminations, budget
accounting, and a single req_llm-backed provider adapter that replaces the
three hand-written provider modules.
### Added — Result, Terminations, Budget
- `ExAthena.Result` — canonical run outcome struct. Every run (success or
error) returns a `%Result{}` carrying final text, message history,
finish_reason, iterations, tool_calls_made, aggregated usage, cost in
USD, duration, model, provider, and telemetry metadata. Replaces the
loose map v0.2 returned.
- `ExAthena.Loop.Terminations` — typed finish_reason subtypes inspired by
the Claude Agent SDK. Each run ends with exactly one of:
`:stop`, `:error_max_turns`, `:error_max_budget_usd`,
`:error_during_execution`, `:error_max_structured_output_retries`,
`:error_consecutive_mistakes`, `:error_halted`, `:error_compaction_failed`.
`Terminations.category/1` classifies each as `:success | :retryable |
:capacity | :fatal` for retry-decision logic.
- `ExAthena.Budget` — usage + cost accumulator. Aggregates token usage
across iterations, computes cost from provider metadata (req_llm +
models.dev), and supports `:max_budget_usd` caps.
### Added — req_llm provider adapter
- `ExAthena.Providers.ReqLLM` — single adapter that delegates to
`req_llm`'s 18+ providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq,
Together, DeepInfra, Vercel, LM Studio, vLLM, llama.cpp, Mistral, Gemini,
Cohere, Bedrock, …). Model names resolve through the `models.dev`
registry for cost + context-window metadata.
- `ExAthena.Config.pop_provider!/1` now threads a `req_llm_provider_tag`
key through opts so bare `model: "llama3.1"` + `provider: :ollama`
auto-expands to the full `"ollama:llama3.1"` spec req_llm expects.
- `Config.req_llm_provider_tag/1` — translate an ExAthena provider atom
into the req_llm `"tag:model-id"` prefix.
### Removed — hand-written provider modules
- `ExAthena.Providers.Ollama`
- `ExAthena.Providers.OpenAICompatible`
- `ExAthena.Providers.Claude`
All three were direct HTTP clients (Ollama + OpenAICompatible) or SDK
wrappers (Claude). req_llm does this work across more providers and
maintains the catalogs. The provider atoms `:ollama`, `:openai`,
`:openai_compatible`, `:llamacpp`, `:claude`, `:anthropic` continue to
work — they now all resolve to `ExAthena.Providers.ReqLLM`.
### Added — dep
- `{:req_llm, "~> 1.10"}`.
### Breaking change — none yet (visible)
Consumer-visible API unchanged in this PR. Every existing call
(`ExAthena.query/2`, `ExAthena.stream/3`, `ExAthena.Loop.run/2`,
`ExAthena.Session.start_link/1`) works identically. The provider-module
change is internal.
Breaking API changes land in PR 2 (Kernel) alongside the new Mode
behaviour and the new stream event shape.
### Tests
- 116 tests passing (up from 91 baseline). 25 new covering Terminations,
Result, Budget, and the req_llm adapter routing.
## v0.2.0 — unreleased
Phase 2 of the agent-loop roadmap: ex_athena is now feature-complete for
multi-turn tool-using work. Drop-in replacement for the Claude Code SDK.
### Added — Agent loop
- `ExAthena.Loop` — multi-turn loop. Infer → parse tool calls → permissions →
PreToolUse hooks → execute → PostToolUse hooks → replay → repeat. Bounded
by `:max_iterations` (default 25). Auto-falls-back between native and
text-tagged tool-call protocols via `ExAthena.ToolCalls.extract/2`.
- `ExAthena.Session` — GenServer owning multi-turn conversation state.
Appends to message history on every turn, resumable, supervised.
- `ExAthena.run/2` + `ExAthena.extract_structured/2` on the facade.
### Added — Tool behaviour + builtins
- `ExAthena.Tool` behaviour (`name`, `description`, `schema`, `execute`).
- `ExAthena.ToolContext` — `:cwd`, `:phase`, `:session_id`, `:tool_call_id`,
`:assigns`, plus `resolve_path/2` that rejects traversal + null bytes.
- `ExAthena.Tools` registry — resolves user tool lists and constructs the
provider-facing + prompt-facing schemas.
- Ten builtin tools:
- `Read` (with line numbering + offset/limit)
- `Glob` (wildcard listing with max cap)
- `Grep` (`rg` when available, pure-Elixir fallback)
- `Write` (creates parent dirs)
- `Edit` (strict exact-string replacement, ambiguity-rejecting)
- `Bash` (port-based, configurable timeout, kills on timeout)
- `WebFetch` (http/https only, 1 MB cap)
- `TodoWrite` (validates statuses, optional notifier callback via `assigns`)
- `PlanMode` (phase transition request — loop consumes the sentinel)
- `SpawnAgent` (synchronous sub-loop, inherits ctx, filters meta-tools)
### Added — Permissions
- `ExAthena.Permissions` with three modes (`:plan`, `:default`,
`:bypass_permissions`), `allowed_tools`/`disallowed_tools` lists, and a
`can_use_tool` callback for interactive approval.
- `:plan` mode blocks mutation tools (`write`, `edit`, `bash`, `todo_write`)
by default; read-only tools always permitted.
### Added — Hooks
- `ExAthena.Hooks` lifecycle matching Claude Code's shape: `PreToolUse`,
`PostToolUse`, `Stop`, `Notification`, `PreCompact`, `SessionStart`,
`SessionEnd`. Matcher groups (regex or string) select which tools fire.
Hook crashes are caught and become `:halt` returns.
### Added — Structured extraction
- `ExAthena.Structured.extract/2` — one-shot JSON extraction with schema
validation. Uses JSON mode when the provider supports it; falls back to a
fenced `~~~json` block for providers that don't. `:validator` opt for
custom validation.
### Test surface
- 95 tests (up from 43 in Phase 1). Coverage per tool, permission modes,
hook lifecycle, loop end-to-end driven by the Mock provider, structured
extraction both JSON-mode and fenced.
### Phase 3 roadmap (next PR)
Start migrating `udin_code` off direct `claude_code` calls. Route ticket
work (`SdkRunner`, `GenericRunner`, `Orchestrator`) through `ExAthena.Session`
so picking `:ollama` in the `ModelProvider` UI begins actually running tasks
on Ollama.
## v0.1.0 — unreleased
Initial public release. Phase 1 of the agent-loop roadmap: pure inference
across any provider, with the canonical message/request/response shapes and
tool-call parsing infrastructure in place for Phase 2's agent loop.
### Added — Core API
- `ExAthena.query/2` — one-shot inference.
- `ExAthena.stream/3` — streaming inference with per-event callback.
- `ExAthena.capabilities/1` — static provider-capability lookup.
- `ExAthena.Config` — tiered resolver (per-call → provider env → top-level
env → default).
- `ExAthena.Error` — canonical error struct with `:kind` atoms
(`:unauthorized`, `:not_found`, `:rate_limited`, `:timeout`,
`:context_length_exceeded`, `:bad_request`, `:server_error`, `:transport`,
`:capability`, `:unknown`).
### Added — Canonical shapes
- `ExAthena.Request` — normalised inference request consumed by every provider.
- `ExAthena.Response` — normalised response with `:text`, `:tool_calls`,
`:finish_reason`, `:usage`, `:model`, `:provider`, `:raw`.
- `ExAthena.Messages.Message` / `.ToolCall` / `.ToolResult` — conversation
primitives. `Messages.from_map/1` tolerates both atom and string keys for
easy interop with provider JSON.
- `ExAthena.Streaming.Event` — canonical streaming events
(`:start`, `:text_delta`, `:tool_call_start`, `:tool_call_delta`,
`:tool_call_end`, `:usage`, `:stop`, `:error`).
### Added — Provider contract
- `ExAthena.Provider` behaviour with `query/2`, `stream/3` (optional),
`capabilities/0`.
- `ExAthena.Capabilities` type declaring features a provider supports.
### Added — Providers
- `ExAthena.Providers.Ollama` — local Ollama via `/api/chat` (native tool-calls
on supported models, SSE-style newline-delimited streaming).
- `ExAthena.Providers.OpenAICompatible` — `/v1/chat/completions` for OpenAI,
OpenRouter, LM Studio, vLLM, llama.cpp server, Together, Groq, etc. SSE
streaming.
- `ExAthena.Providers.Claude` — wraps the `claude_code` SDK. `claude_code`
is declared optional so consumers that don't use Claude aren't forced to
install it. (Streaming via this provider lands in Phase 2 with sessions.)
- `ExAthena.Providers.Mock` — in-memory test double with scripted responses
and event lists.
### Added — Tool-call parsing
- `ExAthena.ToolCalls.Native` — parses OpenAI-style `tool_calls` and Claude
`tool_use` blocks. Tolerant of atom/string keys and JSON-string arguments.
- `ExAthena.ToolCalls.TextTagged` — parses `~~~tool_call` fenced blocks out
of assistant prose for models without native tool-call support.
- `ExAthena.ToolCalls.extract/2` — dispatch-and-fallback between the two
protocols based on provider capabilities.
- `ExAthena.ToolCalls.augment_system_prompt/2` — appends text-tagged
instructions to a system prompt for non-native-capable providers.
### Added — Igniter installer
- `mix ex_athena.install` — writes sensible `config :ex_athena` defaults,
idempotent. Picks Ollama as the default provider. Requires the `igniter`
dep (declared optional).
### Phase 2 roadmap
Still to land: `ExAthena.Tool` behaviour + builtins (Read, Glob, Grep, Write,
Edit, Bash, WebFetch, TodoWrite, PlanMode, SpawnAgent), `ExAthena.Loop`
(multi-turn agent loop), `ExAthena.Session` GenServer, `ExAthena.Hooks`
(PreToolUse/PostToolUse/Stop lifecycle), `ExAthena.Permissions`
(`:plan` / `:default` / `:bypass` + `can_use_tool` callback), and
`ExAthena.extract_structured/2` (JSON-schema-validated output).
### Phase 3+ roadmap
Migrate `udin_code` off the `claude_code` direct dep: route every call through
`ExAthena.*`, delete `UdinCode.Claude.GenericRunner`, make picking `:ollama`
in the ModelProvider UI actually run the whole task lifecycle on Ollama.