CHANGELOG.md

Select File
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

- - -
## [v1.2.0](https://github.com/thetonymaster/normandy/compare/5dbc6ecf1c354015b8d49a2893de68d72f8231bc..v1.2.0) - 2026-06-20
#### Features
- (**agent_horde**) multi-agent multi-provider research pipeline + OpenAI-compatible adapter (#43) - ([5dbc6ec](https://github.com/thetonymaster/normandy/commit/5dbc6ecf1c354015b8d49a2893de68d72f8231bc)) - Antonio Cabrera

- - -

## [v1.1.1](https://github.com/thetonymaster/normandy/compare/c655a054b67a0931fcd8ab1434393fd648cb3d12..v1.1.1) - 2026-06-20
#### Bug Fixes
- (**telemetry**) emit agent.run span from run_with_tools/2 entry point (#42) - ([c655a05](https://github.com/thetonymaster/normandy/commit/c655a054b67a0931fcd8ab1434393fd648cb3d12)) - Antonio Cabrera

- - -

## [v1.1.0](https://github.com/thetonymaster/normandy/compare/931ce1418e0259cb66c563bce8ce019a16d6b77a..v1.1.0) - 2026-06-19
#### Features
- (**tools**) validate tool input at the Dispatch chokepoint before side effects (#40) - ([931ce14](https://github.com/thetonymaster/normandy/commit/931ce1418e0259cb66c563bce8ce019a16d6b77a)) - Antonio Cabrera
#### Continuous Integration
- add cocogitto autopublish pipeline (#41) - ([56b995c](https://github.com/thetonymaster/normandy/commit/56b995cf413c06731a030d8745ecce010a5c169c)) - Antonio Cabrera

- - -


## [1.0.0] - 2026-06-17

### Added

- **Phase 7 — distributed multi-node sessions (Tiers 0/1/2 + eager resume).**
  - `Normandy.Behaviours.SessionStore.Postgres` — durable session store over
    Ecto/Postgres (entries, opaque turn state, config template), with migrations
    and `resume_policy` / `config_template` columns. The Tier-1 durable store.
  - `Normandy.Behaviours.SessionRegistry.Horde` (`:via`, `members: :auto`) and
    `Normandy.Agents.Turn.Supervisor.Horde` — CRDT-backed distributed registry +
    dynamic supervisor that route to / own `Turn.Server`s across a cluster (Tier-2).
  - `Normandy.Agents.Turn.ResumeReaper` — selective **eager handoff** on
    `:nodedown`. Because `Horde.DynamicSupervisor` does not redistribute a dead
    node's children, the reaper restarts the eager, unregistered, non-terminal
    sessions whose server died with the lost node. Lazy rehydrate (route →
    `whereis` → rehydrate-on-demand) needs no reaper.
  - `Normandy.Behaviours.AgentTemplate` + a persisted **config template**
    (`Normandy.Agents.Turn.ConfigTemplate`): the non-secret config
    (model/temperature/behaviour refs/tools) needed to reconstruct an agent on
    rehydration; a `template_provider` resolves it. Credentials are never persisted.
  - `SessionStore` gained `save_config_template/3`, `load_config_template/2`, and
    `list_resumable/1` (eager session ids); `SessionRegistry` gained the optional
    `child_name/2` (`{:via, …}`) for atomic, supervisor-driven start that closes the
    start-time race. `InMemory` / `ETS` / `Native` impls were extended to match.
  - `Normandy.Cluster.child_specs/1` — one-call wiring of the Horde registry +
    supervisor + reaper (plus an optional `libcluster` `Cluster.Supervisor` when
    `:topologies` are supplied and `libcluster` is loaded).
  - Tier model: **Tier-0** in-memory/ETS single-node default (unchanged);
    **Tier-1** durable store + lazy rehydrate; **Tier-2** distributed
    registry/supervisor + eager reaper.
  - Drop-in backends behind the same `SessionStore` / `SessionRegistry` seam:
    `SessionStore.Mnesia` (OTP-native distributed store, transactional appends, no
    external DB), `SessionStore.Redis` (Redis Streams), `SessionRegistry.Redis`
    (`:via` registry using Redis as the name table), and the
    `Normandy.Cluster.setup_mnesia_store!/1` / `redis_child_specs/1` wiring helpers.

- **Guardrails — pre-charge admission, threaded context, fail-open, semantic scope.**
  - `Normandy.Agents.BaseAgent.admit/2,3` runs input guardrails as a **pre-charge
    filter** (no turn, memory, or circuit breaker), returning
    `:ok | {:block, violations}` instead of raising — reject disallowed input
    before paying for a turn.
  - `Normandy.Guardrails.run/3` threads a caller-supplied `context` map to guards
    implementing the optional `Guard.check/3` callback (`check/2`-only guards are
    unaffected) — host data a guard needs but the framework must not interpret
    (ids, locale, conversation history).
  - Per-guard `:on_error` policy: `:reraise` (default — a config bug stays a crash),
    `:open` (rescue the guard's raise and treat as a pass, for a guard fronting a
    flaky external service), `:closed` (rescue and turn it into a `:guard_error`
    violation). Only the `check` call is rescued; a malformed return always raises.
  - `Normandy.Guardrails.Builtins.SemanticScope` — a provider-agnostic hybrid scope
    guard: a cheap injected `fast_path` in front of an injected `classifier`
    (`(value, context) -> :allow | {:block, reason}`); the `:block` reason becomes
    the violation's machine-readable `:constraint`. (#31)

- **Phase 6 — AgentProcess durable turn engine (`:server` mode).**
  - `Normandy.Coordination.AgentProcess` opt-in `:server` mode (`turn_engine: :server`)
    routing turns through the durable `Turn.Session`/`Turn.Server` engine: approval
    parking, passivation, and persistence. `:inline` remains the default and is
    byte-for-byte unchanged.
  - `AgentProcess.approve/2` delivers human-approval decisions to a parked turn.
  - Non-blocking `:server` `run/3`/`cast/3`: the GenServer stays responsive while
    a turn is parked awaiting approval or passivated.
  - Store-authoritative `get_agent/1`: reconstructs agent (including conversation
    memory) from `SessionStore` in `:server` mode.
  - Template-only `update_agent/2` in `:server` mode: updates config template
    (model/temperature/behaviours/tools); memory mutations are ignored because
    `SessionStore` is authoritative.
  - Owned-or-supplied session infra: `:store`, `:registry`, `:supervisor` may be
    passed to `start_link`; if omitted, the process starts and owns in-memory
    defaults that terminate with it. `:subscriber`, `:handlers`,
    `:approval_timeout_ms`, and `:idle_timeout_ms` are forwarded to `Turn.Session`.

- **Phase 5 — compaction wiring (`:steering` boundary).**
  - `Normandy.Behaviours.Compactor` behaviour (+ `NoOp` default, opt-in
    `WindowManager` impl) invoked at the `:steering` turn boundary when the context
    window is exceeded; `compactor` slot on `Behaviours.Config`. (PR #32)

### Fixed

- Flaky `Turn.Supervisor.Horde` test: a `start_server` racing the `:via`
  registration could observe a transient `{:error, {:already_started, _}}`; the
  test now retries the start through the via race. (#36)
- `convert_turn_output/3` previously returned the empty output-schema struct for
  tool-using turns with non-`chat_message` output schemas, dropping the final-
  response content. Non-`chat_message`-schema agents using tools were affected.
- `Normandy.Context.TokenCounter` was unusable against the live API: every
  `count_message/2,3`, `count_conversation/2`, and `count_detailed/2` call sent
  `max_tokens` in the `/v1/messages/count_tokens` payload, which the endpoint
  rejects (`400 invalid_request_error: "max_tokens: Extra inputs are not
  permitted"`). The field is now omitted. The default model also moved off the
  retired `claude-3-5-sonnet-20241022` to `claude-haiku-4-5-20251001`. The
  previously-skipped token-counter tests are now enabled as `:integration` tests
  and pass against the live endpoint.

### Migration

- No action required: `:inline` is the default and is byte-for-byte unchanged.
- To adopt the durable engine:
  `AgentProcess.start_link(agent: config, turn_engine: :server)`, optionally
  passing shared `:store`/`:registry`/`:supervisor`.

## [0.9.0] - 2026-06-17

### Added

- **Phase 4a — approval core + chokepoint split (harness decomposition).**
  - `Normandy.Agents.Dispatch.classify/3` (registry → before-hooks → policy →
    verdict) and `Dispatch.execute/4` (budget → execute → record → after).
    `dispatch_one/3` is re-expressed as `classify ➞ execute`; its observable
    behavior is unchanged (the existing dispatch suite is the parity oracle).
  - `Normandy.Agents.Turn` core gains real human-approval parking: an
    `:awaiting_approval` state, `parked_calls`/`held_results` on `%Turn.State{}`,
    and the `{:needs_approval, held, parked}` → `{:approval, decisions}` →
    `{:approved_results, results}` event flow, with the batch-results logic
    factored into a shared `apply_tool_results/2` (one decrement per batch,
    API-order preserved). The synchronous inline path is unchanged — only the
    Phase 4b `:gen_statem` shell will exercise these transitions.

- **Phase 4b — `:gen_statem` Turn shell (harness decomposition).**
  - `Normandy.Agents.Turn.Server`: an opt-in asynchronous `:gen_statem` interpreter
    of the pure `Turn` FSM (the async analog of the inline `Driver`). Coarse
    lifecycle states (`:running`/`:awaiting_approval`/`:idle`) carry monitored
    Tasks for blocking effects, `state_timeout`s (approval expiry, passivation
    idle), persistence at suspend points, and mid-turn message postponement. Real
    human-approval parking: park on `:needs_approval`, resume via
    `Turn.Server.approve/2`, fail-closed on approval timeout.
  - `Normandy.Agents.Turn.Session` (router: whereis → route | rehydrate),
    `Normandy.Agents.Turn.Supervisor` (`DynamicSupervisor`, `restart: :transient`).
  - `Normandy.Behaviours.SessionRegistry` (`whereis/register/unregister`) + `Native`
    default over Elixir `Registry`; `session_registry` slot on `Behaviours.Config`.
  - `Normandy.Components.AgentMemory.from_entries/1` rebuilds memory from stored
    history for rehydration.
  - Four `BaseAgent` turn helpers exposed `@doc false` for shell reuse
    (`non_streaming_handlers/0`, `admit_turn_input/2`, `base_agent_pipeline/1`,
    `turn_response_model/1`) — visibility-only, no behavior change.
  - `BaseAgent.run/2`'s inline path is unchanged; `Turn.Server` is additive.

### Fixed

- Corrected the version stamp: the prior `1.0.0` (Phase 3) was never tagged and
  `1.0.0` is reserved for the final phase of the harness-decomposition milestone.
  Phase 3 is re-labeled `0.8.0` (a pre-1.0 breaking change from `0.7.0`).

## [0.8.0] - 2026-06-12

### Added

- **Branching session memory + SessionStore (Phase 3 of the harness
  decomposition).**
  - `Normandy.Components.AgentMemory` is now a struct of parent-linked
    `AgentMemory.Entry` records (`id` + `parent_id`) instead of a linear list.
    Branching is opt-in via `fork/2`; a linear conversation is a degenerate
    single-parent chain and `history/1` output is unchanged. New accessors:
    `fork/2`, `entries/1`, `get_entry/2`, `entry_chain/1`, `messages/1`,
    `latest_message/1`.
  - `Normandy.Behaviours.SessionStore` (`append_entry/3`, `history/2`, `fork/3`,
    `save_turn_state/3`, `load_turn_state/2`) with `InMemory` (default) and `ETS`
    impls sharing one contract suite. Both serialize per-session writes (the
    `InMemory` impl via its `Agent`, the `ETS` impl via a GenServer that owns a
    private table), so concurrent appends/forks to one session can't clobber each
    other — a guarantee the shared contract now exercises under concurrency. The
    turn-state half round-trips an opaque term; its consumer (suspendable turn /
    passivation) lands in Phase 4. Postgres is deferred.
  - `session_store` slot on `Normandy.Behaviours.Config` (default
    `{SessionStore.InMemory, []}`) — selectable per-agent, not on the dispatch
    pipeline, not yet consumed by the turn loop.

### Changed

- **BREAKING:** `AgentMemory`'s struct shape and `dump/1`/`load/1` JSON format
  changed (entry-based). The dump carries a `version` key (currently `1`) as a
  forward marker for future format detection — `load/1` does not yet branch on it.
  Code that read the old `%{history: [...]}` map shape must use the public API or
  the new accessors. The `dump/1`/`load/1` format is not backward-compatible with
  pre-1.0 dumps.
- `count_messages/1` now returns the total number of stored entries
  (`map_size(entries)`). For a linear conversation this is identical to the old
  active-chain length; after a `fork/2` with divergent appends it counts entries
  across all branches, not just the active one.

### Security

- `AgentMemory.load/1` no longer decodes dumps with `keys: :atoms`. A blanket
  atom decode interned every nested content key, so an untrusted/corrupt dump
  could exhaust the VM atom table. `load/1` now decodes with string keys and
  atomizes only known struct field names (via `to_existing_atom`, which never
  mints new atoms); raw content round-trips verbatim.

### Robustness

- `AgentMemory` graph walks are cycle-safe. Both the active-branch walk
  (`chain_newest_first/1`, behind `history/1`, `entry_chain/1`, `messages/1`) and
  the survivor-rewiring walk (`surviving_ancestor`, behind `delete_turn/2`) track
  visited ids, so a corrupt dump carrying a parent cycle terminates instead of
  looping forever.

### Notes

- Linear-conversation observable behavior is unchanged — the end-to-end suite is
  the parity oracle. Internal consumers (`base_agent` iteration counters,
  `window_manager`/`summarizer` memory rebuild) and white-box tests were migrated
  behavior-preservingly to the new accessors.

## [0.7.0] - 2026-06-01

### Added

- **Pluggable behaviours (Phase 2 of the harness decomposition).** The dispatch
  chokepoint's function slots are now backed by four Elixir `@behaviour`s, each
  with a default impl that preserves current behavior:
  - `Normandy.Behaviours.PolicyEngine` (`check/2`) — default `AllowAll`; plus a
    shipped `Ruleset` impl that evaluates ordered in-memory rules
    (`match` glob → `:allow | :deny | :require_approval`, first-match-wins,
    configurable `default_action`).
  - `Normandy.Behaviours.BudgetTracker` (`check/2`, `record/2`) — default `NoOp`.
  - `Normandy.Behaviours.CredentialProvider` (`get_token/2`) — default
    `FromClient` (extracts `api_key` from the client struct). Defined and
    defaulted; LLM-call consumption deferred.
  - `Normandy.Behaviours.ModelCatalog` (`get/1`, `supports?/2`,
    `context_window/1`) — default `Static`, now the single source of truth for
    `WindowManager`'s context-window limits.
- **`Normandy.Behaviours.Config`** bundle + `to_pipeline/1`, selectable per-agent
  via the new `BaseAgentConfig.behaviours` field. `before/after` hooks are now
  first-class, config-selectable function slots.

### Notes

- Additive and default-off: with the default bundle, observable behavior is
  unchanged. No migration required.

## [0.6.3] - 2026-05-12

### Added

- **`Normandy.LLM.JsonDeserializer` now supports opt-in recovery from a
  specific truncated-JSON failure mode**: when an LLM (notably
  Nemotron-Nano-12B-VL on DigitalOcean Inference) emits a response that
  ends inside an unclosed top-level string field — typically because the
  model entered a `\n`-escape runaway and ran out of output tokens —
  `parse_and_validate/3` and `deserialize_with_retry/8` now accept
  `recover_truncated_strings: true`. When the flag is on AND the strict
  decode fails AND the content looks like a single top-level object AND a
  byte scanner determines the unclosed string is at the outermost depth,
  Normandy truncates the string at the last position whose preceding
  bytes were not part of a `\n` escape, appends a closing `"`, and
  appends `}`/`]` closers derived from a tracked open-container stack.
  The recovered payload is re-decoded once through the adapter and run
  through the same cast pipeline as the happy path; a
  `[:normandy, :json_deserializer, :recovery]` telemetry event is
  emitted on success with `%{recovered: 1}` measurements and
  `%{strategy: :truncated_string, byte_size_before: _, byte_size_after: _}`
  metadata. Default is `false` — pre-existing callers see no behaviour
  change. Designed for vision-pipeline `page_text` transcription
  payloads where the alternative is an empty `%Output{}` and zero RAG
  indexing on customer-grade documents; not a general-purpose JSON
  repair. Nested-string truncation (e.g. `{"offerings":[{"name":"Paq`)
  explicitly does NOT recover, since manufacturing a closer there would
  produce a half-truthful inner record rather than empty top-level data.

## [0.6.2] - 2026-05-11

### Fixed

- **`Normandy.LLM.JsonDeserializer` now recovers from tool-use-style
  response envelopes**: some vision/instruction-following LLMs
  (Nemotron-Nano-12B-VL on DigitalOcean Inference, some Llama variants)
  wrap their JSON in `{"name": "...", "arguments": {...}}` even when
  given `response_format: {"type": "json_object"}` and a system prompt
  asking for the bare object shape. `parse_and_validate/3` previously
  cast such payloads to an all-defaults struct and returned `:ok`, so
  downstream consumers (notably `event_crew`'s vendor-doc vision
  extraction) silently dropped every populated field. `parse_and_populate/3`
  now retries the cast once against `parsed["arguments"]` when the outer
  attempt either succeeded with all-defaults OR returned a
  validation error, and the `"arguments"` value is itself a map. If the
  retry yields any populated field it wins; if it still yields
  all-defaults the original result is preserved so bare-shape responses
  see no behaviour change. One level only — `{"arguments": {"arguments":
  {...}}}` is not unwrapped. Inner cast errors are propagated when the
  inner map carries at least one permitted key (atom or string form), so
  e.g. `{"arguments":{"count":"not_a_number"}}` against `count: :integer`
  surfaces the validation error instead of returning an empty struct;
  inner errors are still suppressed when no permitted keys are present,
  so unrelated envelopes don't manufacture new failures. The
  `{:ok, struct}` / `{:error, reason}` contract is unchanged for every
  pre-existing shape (#22).
- **`get_required_fields/1` in `JsonDeserializer` now reads required
  fields from the correct source**: the helper was filtering
  `__specification__/0` entries as if each were a metadata map, but
  `Normandy.Schema` stores `{name, type}` tuples there — so the filter
  never matched and `validate_required/2` was being called with an empty
  list for every schema. It now reads `__schema__(:required)` (the
  source of truth) and falls back to the old scan for schemas that don't
  expose that callback. Required-field validation now actually fires for
  schemas declaring `field :foo, _, required: true` (#22).

## [0.6.1] - 2026-05-02

### Fixed

- **OpenTelemetry context propagation across `Normandy.Tools.Executor`
  spawn sites**: tool bodies run via `Task.async/1` in
  `execute_with_timeout/2` and `execute_parallel/3`. OTel context lives in
  the process dictionary, so spawned `Task`s started with an empty context
  — any span opened inside a tool's `run/1` became a root span in a fresh
  trace instead of nesting under `normandy.tool.execute`. Symptom in
  downstream apps: integration spans (e.g. external API calls, blob
  downloads) appeared as orphans in Tempo, and `normandy.tool.execute`
  was an opaque blob with no breakdown. The executor now captures the
  parent context before each spawn and re-attaches it inside the spawned
  function. Applied at both `Task.async` sites:
  `execute_with_timeout/2` (the primary fix) and `execute_parallel/3` (so
  the inner timeout call sees a non-empty context to propagate further).
  The capture/restore helpers (already present in `BaseAgent` for
  `Task.async_stream`) are extracted into a new internal
  `Normandy.Telemetry.OtelCtx` module and shared by both call sites; they
  no-op when `:opentelemetry` is not loaded, so consumers without OTel
  pay nothing (#20).

### Security

- **Secret redaction in `Inspect` output for `Normandy.LLM.ClaudioAdapter`
  and `Normandy.A2A.AgentTool`**: a live API key leaked through default
  error logging in a downstream project when a `Task` crashed and the
  BEAM error logger inspected closure args holding a `%ClaudioAdapter{}`.
  `Kernel.inspect/2` rendered the secret in plaintext. Both structs now
  carry `@derive {Inspect, except: [...]}` covering `:api_key`
  (`ClaudioAdapter`) and `:auth_token` (`AgentTool`).
  `Normandy.MCP.ServerConfig` already had this protection. Field access
  (dot syntax, `Map.get/2`, pattern matching) is unchanged; only the
  `Inspect` representation is affected. Locked in with regression tests
  asserting the secret value never appears in `inspect/1` output (#19).

### Changed

- **ExDoc warnings silenced**: `Normandy.Type.load/1` is now declared as
  an optional callback (the contract was already documented and
  exercised by custom-type implementations).
  `Normandy.ParameterizedType.embed_as/2` is also now declared as an
  optional callback (already in `defoverridable` with a default `:self`
  impl). No behavioural change; doc-build is now warning-clean (#18).

## [0.6.0] - 2026-05-01

### Added

- **Typed-struct cache control on multimodal content blocks**: each of
  `Normandy.Components.ContentBlock.{Text,Image,Document}` gains an optional
  `cache_control` field plus `with_cache/1` (ephemeral, the common case) and
  `with_cache/2` (caller-supplied map, e.g. `%{"type" => "ephemeral", "ttl"
  => "1h"}`). Atom keys are accepted and stringified at serialization time.
  `to_claudio/1` emits the `cache_control` key only when set, so existing
  callers see no wire-shape change. Closes the gap left in `0.5.1` where
  multimodal cache breakpoints required hand-built raw maps.
- **Conversation-breakpoint auto-cache strategy**: when
  `enable_caching: true`, `Normandy.LLM.ClaudioAdapter` now annotates the
  last block of the **last user message** with
  `cache_control: %{"type" => "ephemeral"}`, mirroring how Anthropic
  recommends placing prompt-cache breakpoints on chat conversations.
  Triggers only for list-form or single-`ContentBlock`-struct content —
  plain-string user messages keep their existing wire shape so chat-text
  callers see no behaviour change. Caller-set `cache_control` (via
  `with_cache/1-2` or hand-built atom/string-keyed `cache_control` on a raw
  map) is preserved; the adapter never overrides it. Earlier user messages
  in the history are not annotated.
- **List-form system prompt caching**: the system clause of
  `add_single_message/3` previously short-circuited
  `enable_caching: true` for list-form content because Claudio's
  `set_system_with_cache/2` only wraps strings. The adapter now annotates
  the last block of a list-form system prompt and routes it through
  `set_system/2` with pre-shaped wire blocks. Symmetric with the existing
  string-system caching path.
- **Normandy.Components.ContentBlock.CacheControl** (`@moduledoc false`):
  internal helper that string-normalizes top-level cache_control keys and
  raises `ArgumentError` when an atom and string version of the same key
  collide post-normalization, so caller intent is never silently lost.

### Changed

- **`dispatch_multimodal/3` named-helper patterns now require
  `cache_control: nil` on both blocks**. Claudio's
  `add_message_with_image`, `add_message_with_image_url`, and
  `add_message_with_document` take raw args and rebuild blocks internally —
  any `cache_control` on the source `ContentBlock` struct would have been
  silently dropped on the wire. With this change, cache-annotated blocks
  always go through the raw-list fallback path that preserves block fields.
- **Multimodal system prompt with `enable_caching: true`** now emits
  `cache_control` on the last system block. Previously this combination
  was a documented opt-out — the adapter ignored `enable_caching` for
  list-form system content and required callers to hand-build annotated
  block maps. Wire-shape change for callers that hit this exact combination
  in `0.5.x`.
- **Claudio dependency** bumped to `~> 0.5.0`.

## [0.5.1] - 2026-04-29

### Added

- **Multimodal user input via list-shaped content blocks**: agents can now
  receive a list of content blocks (e.g. `[%{"type" => "text", ...}, %{"type"
  => "image", ...}]`) through `MyAgent.run/2`, `MyAgent.run/3`, and
  `MyAgent.run_with_tools/2`. The list flows through `prepare_input/1`,
  `AgentMemory`, and the Claudio adapter unchanged, where
  `add_single_message/3` already dispatches it through the existing
  multimodal path. Two minimal upstream changes make this work:
  `Normandy.Components.BaseIOSchema` now has a `for: List` impl whose
  `to_json/1` returns the list verbatim (mirrors the four-callback shape of
  the existing `BitString`/`Map` impls), and Normandy.DSL.Agent.prepare_input/1
  passes lists through unchanged. Strings continue to wrap into
  `%{chat_message: ...}` and maps continue to pass through (unchanged).
  Callers that need prompt-cache breakpoints inside multimodal user content
  can hand-build raw block maps with a `"cache_control"` key — the adapter's
  raw-list path preserves them verbatim. Typed-struct caching support on
  `Normandy.Components.ContentBlock.{Text,Image,Document}` is deferred to a
  future release.

## [0.5.0] - 2026-04-29

### Added

- **Per-agent `max_tool_concurrency` (bounded parallel tool execution)**:
  `BaseAgentConfig` gains a `max_tool_concurrency` field (default `1`). The
  tool loop in `BaseAgent` now wraps each per-call worker through
  `Task.async_stream(ordered: true, max_concurrency: config.max_tool_concurrency,
   timeout: :infinity, on_timeout: :kill_task)` in both the non-streaming and
  streaming branches. Default `1` preserves pre-0.5.0 sequential behaviour
  (modulo the worker-process semantics noted under *Changed* below). Values
  `> 1` opt the agent into parallel tool execution — each tool call runs in
  its own `Task` worker, ordered by the LLM's call sequence, with up to N
  running at once. OTel parent context is propagated softly (via
  `Code.ensure_loaded?(OpenTelemetry.Ctx)` — Normandy does not add OTel as a
  hard dep) so consumer-side telemetry handlers continue to nest tool spans
  under the parent `agent.run` span.
- **DSL macro `max_tool_concurrency/1`**: sets the compile-time default
  inside `Normandy.DSL.Agent.agent do ... end`. Runtime overrides on
  `MyAgent.new/1` (top-level keyword, or via `:override`) take precedence as
  for any other agent setting.
- **Input validation for `:max_tool_concurrency`**: non-integer values
  (`"4"`, `4.0`, etc.) now raise `ArgumentError` rather than silently
  coercing to a default — a config bug should surface, not hide. Integers
  `< 1` are clamped to `1` to match the runtime tool-loop floor. Validation
  runs at both layers: at compile time inside the DSL `__before_compile__`
  (so `MyAgent.config().max_tool_concurrency` doesn't lie about the value
  the agent will actually use), and at runtime inside `BaseAgent.init/1`
  for `new/1` and `:override` callers. The shared
  `BaseAgent.normalize_max_tool_concurrency/1` helper drives both paths.
- **`BaseAgent.unwrap_tool_task_result!/1`** (`@doc false`, public for
  testability): translates a `Task.async_stream` element into the underlying
  tool result. The linked `Task.async_stream/3` propagates worker raises to
  the caller via process-link before yielding, so `{:exit, {exception,
  stacktrace}}` is unreachable for raises in the current configuration; the
  helper still handles it (re-raising with the original stacktrace) along
  with `{:exit, reason}` — most importantly `{:exit, :timeout}` from
  `on_timeout: :kill_task` and any deliberate `exit/1` from tool wrapper
  code — so those fail loudly instead of hitting `FunctionClauseError`
  against a `{:ok, _}`-only pattern.

### Changed

- **Streaming callback process semantics (`stream_with_tools/3`)**: the
  callback now executes in the `Task.async_stream` worker process, not the
  caller — including at `max_tool_concurrency: 1`, because `Task.async_stream`
  always spawns one worker per closure. Callbacks that referenced `self()`
  inside (e.g. `fn :tool_result, r -> send(self(), {:tool_result, r}) end`)
  will now target the worker PID. To send messages back to the owner, capture
  the PID outside the callback first: `parent = self(); fn :tool_result, r ->
  send(parent, ...) end`. This is the canonical Elixir pattern for any
  callback that may run in a worker process.
- **Streaming `:tool_result` callback ordering at concurrency > 1**:
  `stream_with_tools/3` invokes `callback.(:tool_result, result)` from inside
  each worker as soon as that tool finishes, so at `max_tool_concurrency > 1`
  callers observe `:tool_result` events in **completion order**, not
  LLM-call order. The final list of tool results sent back to the LLM stays
  in LLM-call order (`Task.async_stream` is invoked with `ordered: true`).
  Callers that need call-order callback delivery should keep
  `max_tool_concurrency: 1` or buffer + reorder client-side.
- **Tool loop refactor (`BaseAgent`)**: extracted the per-tool-call body of
  `execute_tool_loop/2` and `execute_streaming_tool_loop/3` into the private
  helpers `execute_one_tool_call/2` and `execute_one_streaming_tool_call/2`.
  Pure refactor — behaviour, ordering, and process semantics are identical to
  the previous inline `Enum.map` closures. Sets up a follow-up change to swap
  `Enum.map` for an opt-in bounded parallel runner (per-agent
  `max_tool_concurrency`) without churning the closure body again.

### Security

- **Atom-table hardening (`BaseAgent`)**: replaced `String.to_atom/1` over
  LLM-supplied tool input keys with `normalize_tool_field_key/2`, which only
  returns atoms that already exist as fields on the tool struct. LLM tool
  input is influenced by attacker-controllable prompt content (chat
  messages, webhooks); the previous code registered every unknown key in
  the global atom table on the way to `struct/2` discarding it, and BEAM
  never garbage-collects atoms — sustained crafted input could exhaust the
  table and crash the VM. Unknown keys are now silently dropped, preserving
  the existing user-visible behaviour of `struct/2`.

### Fixed

- **Streaming tool input normalisation (`BaseAgent`)**:
  `execute_one_streaming_tool_call/2` now routes `tool_call["input"]` through
  `normalize_tool_input/1` instead of an ad-hoc `case` that only accepted
  `nil`, maps, and binaries. Streaming tool input is raw LLM JSON, so a
  list/number/boolean previously raised `CaseClauseError` and aborted the
  whole streaming tool loop; unexpected shapes now degrade to `%{}`. The
  redundant `parse_json_input/1` private helper (functionally identical to
  the binary clause of `normalize_tool_input/1`) is removed.

## [0.4.0] - 2026-04-25

### Added

- **Multimodal Content Blocks**: Image and document support for agent messages
  - `Normandy.Components.ContentBlock.{Text, Image, Document}` framework-neutral
    block types with per-module `to_claudio/1` emitting Anthropic wire shapes
  - `ClaudioAdapter.add_single_message/3` opportunistically dispatches to
    Claudio's named helpers for the three wrapped shapes (base64 image+text,
    URL image+text, document+text); other shapes (multi-block, reversed,
    image-alone, pre-shaped maps with `cache_control`) fall through to a
    raw-list `add_message/3`
  - `Normandy.Components.Message.content` widened from `:struct` to `:any` with
    extended `@type t` covering `String.t() | struct() | [struct()]`
  - Token accounting in `WindowManager`, `TokenCounter`, and `Summarizer` now
    handles list content (image blocks ~1600 tokens, documents ~3000) instead
    of silently zero-counting them

- **Guardrails**: First-class content-level constraint layer for agent I/O,
  composable across input and output stages
  - `Normandy.Guardrails` runner with short-circuit semantics
  - `Normandy.Guardrails.Guard` behaviour for custom guards
  - `Normandy.Guardrails.ViolationError` raised on input violations
  - Built-in guards: `MaxLength`, `ForbiddenSubstrings`, `RegexGuard`
    (`:deny`/`:require` modes), `RequiredFields`
  - `BaseAgent` integration via new `:input_guardrails` / `:output_guardrails`
    config keys (input violations halt, output violations log and continue,
    mirroring `ValidationMiddleware`)
  - DSL macro `guardrails(:input | :output, [specs])` in `Normandy.DSL.Agent`
  - Telemetry event `[:normandy, :agent, :guardrail, :violation]` with
    `:stage`, `:agent_name`, `:guards`, and `:violations` metadata
  - Works on both non-streaming (`run/2`) and streaming paths — see the
    streaming output guardrails entry below for streaming specifics

- **Streaming Output Guardrails**: Output guardrails now run on streaming paths
  - `:accumulate` mode (default) — guards run on the final assistant text
    after the stream ends; log-and-continue on violation, matching
    non-streaming `run/2` posture
  - `:incremental` mode (opt-in) — guards run every
    `:output_guardrails_chunk_size` bytes of accumulated text plus a tail
    pass when the stream ends with unchecked bytes; on violation halts
    mid-stream, strips any in-flight `tool_use` content block, and returns
    with `:guardrail_violations` populated
  - Three signal channels on both modes: `:guardrail_violation` stream
    callback event, `:guardrail_violations` field on the returned response,
    and the existing telemetry event (metadata gains `streaming: true` and
    `mode: :accumulate | :incremental`)
  - New DSL macros inside `agent do … end`: `streaming_mode/1`,
    `streaming_chunk_size/1`
  - New `BaseAgentConfig` fields: `:output_guardrails_streaming_mode`,
    `:output_guardrails_chunk_size`

### Fixed

- **Streaming Cold-Start**: `BaseAgent.stream_response/3` and
  `stream_with_tools/3` no longer fail with `"Client does not support streaming"`
  when invoked as the first call through the `Normandy.Agents.Model` protocol.
  With protocol consolidation enabled (default in `:dev`/`:prod`), the
  consolidated impl module was not auto-loaded, so the `function_exported?/3`
  capability probe returned false. Now wraps the probe with `Code.ensure_loaded/1`
  (#9).

### Changed

- **Claudio dependency** bumped to `~> 0.4.0`. Required for streaming SSE
  events to decode with string-keyed data maps (matches the raw Anthropic
  JSON convention); earlier `keys: :atoms` decoding silently dropped
  callback dispatches in Normandy's adapter.

## [0.3.0] - 2026-04-18

### Added

- **MCP and A2A Protocol Support**: New protocols for interoperability
  - `Normandy.MCP.ToolWrapper` for wrapping Model Context Protocol (MCP) tools
  - `Normandy.MCP.Registry` for managing MCP tool collections
  - `Normandy.A2A.Server` for agent-to-agent communication
  - Support for cross-agent tool execution and discovery

- **Structured Agent Lifecycle Logging & Telemetry**: Enhanced observability
  - `Logger` calls for agent, LLM, and tool lifecycle events
  - Telemetry events for:
    - `[:normandy, :agent, :run, :start | :stop | :exception]`
    - `[:normandy, :llm, :call, :start | :stop | :exception]`
    - `[:normandy, :tool, :execute, :start | :stop | :exception]`
  - Automatic duration tracking for all operations
  - Metadata enrichment with agent names, models, and tool names
  - OpenTelemetry-friendly logging with span context correlation

- **Telemetry Metadata & Robustness**:
  - Agent names included in all telemetry metadata
  - Improved error handling in LLM adapter calls
  - Support for `Finch` connection pool in `ClaudioAdapter`

- **DSL Enhancements**:
  - Exposed `run/3` in DSL for direct streaming support
  - Improved agent definition ergonomics

#### Schema Enhancements
- **Schema-Based Tool Definition**: New `SchemaBaseTool` mixin for streamlined tool creation
  - `tool_schema` macro providing single source of truth for tool definitions
  - Automatic JSON schema generation and validation
  - ~60% reduction in boilerplate code compared to manual approach

- **Tool Registry Metadata Methods**: Enhanced introspection capabilities
  - `get_metadata/2`, `list_metadata/1`, `filter_by_required_params/2`, etc.
  - Find tools by constraints, parameter types, or required fields

- **Validation Middleware**: Automatic validation for agent inputs and outputs
  - Type-safe agent execution with path-based error messages
  - Fail-fast on invalid inputs, warn on invalid LLM outputs

### Changed

- **Calculator Tool Migration**: Migrated to schema-based approach with improved type safety
- **HTTP Client**: Added support for custom `Finch` pools in `ClaudioAdapter`
- **JSON Schema Type Format**: Schema types now use atoms (`:object`) instead of strings (`"object"`)
- **CI/CD**: Adjusted test coverage threshold to 60% and updated matrix testing

### Fixed

- **Streaming Stability**: Restored tool loop, message conversion, and event shape in streaming responses
- **Tool Loop**: Fixed unwrap of double-nested JSON in `chat_message` after tool loop completion
- **JSON Deserialization**: Return structured content blocks from tool `to_json` instead of raw strings
- **Dependency Issues**: Added default `Poison` adapter to prevent encoding errors in consuming apps
- **Logging**: Preserved DSL-defined agent names in lifecycle logs
- **Dialyzer**: Resolved various type errors and added ignore patterns for clean analysis
- **CI**: Fixed compilation warnings and intermittent test failures

### Test Coverage
- Total tests: 900+ (doctests + property tests + unit tests)
- 0 failures, 100% passing rate

## [0.2.0] - 2025-10-28

### Added

#### CI/CD Infrastructure
- **GitHub Actions Workflow**: Comprehensive CI pipeline for automated testing
  - Matrix testing across Elixir 1.15, 1.16, 1.17 and OTP 26, 27
  - Separate jobs for unit tests, integration tests, Dialyzer, and dependency audits
  - Smart caching for dependencies and PLT files
  - Conditional integration test execution with API key support
  - Documentation in `.github/workflows/README.md`

#### Examples and Documentation
- **Comprehensive Examples**: Three runnable examples demonstrating key features
  - Customer support agent with custom tools and conversational memory
  - Multi-agent research workflow with parallel execution
  - Structured data extraction with validated output schemas
  - Complete examples README with setup instructions and key concepts

- **Customer Support Example Application**: Production-ready multi-agent system
  - Four specialized agents (Greeter, Technical, Billing, Order Support)
  - Custom tools for knowledge base, order lookup, refunds, and ticket creation
  - Interactive CLI interface with session management
  - Data stores for orders, tickets, and knowledge base
  - Full application architecture documentation

#### Context Management Improvements
- **TokenCounter Test Coverage**: Comprehensive unit tests for token counting
  - 15 tests covering all TokenCounter functionality
  - Mock-based testing for unit tests
  - Integration tests for real API calls
  - Error handling and edge case coverage

- **Date/Time Context Provider**: Dynamic timestamp injection for prompts
  - `Normandy.Components.DateTimeProvider` for temporal context
  - Configurable timezone support
  - Test coverage for provider functionality

#### Development Tools
- **JSON Deserializer**: Improved JSON parsing with error handling
  - `Normandy.LLM.JsonDeserializer` for robust JSON parsing
  - Fallback mechanisms for malformed JSON
  - Integration tests for retry scenarios

### Fixed
- **TokenCounter Implementation**: Critical bug fixes for production use
  - Fixed Claudio client initialization (map format instead of keyword list)
  - Fixed agent structure access patterns (direct field access)
  - Fixed system prompt extraction (pattern matching instead of get_in/2)
  - Added comprehensive error handling for malformed agents

- **Access Protocol Issues**: Resolved struct field access errors
  - Replaced get_in/2 with pattern matching for BaseAgentConfig
  - Improved error messages for malformed agent structures

### Documentation
- Enhanced ExDoc configuration with organized module groups
- Examples directory with comprehensive usage documentation
- CI/CD workflow documentation with local testing commands
- Customer support application architecture guide

### Test Coverage
- 443 unit tests (29 doctests + 21 properties + 393 tests)
- 62 integration tests (56 API + 6 comprehensive DSL tests)
- 15 new TokenCounter unit tests
- Total: 505+ tests, all passing

## [0.1.0] - 2025-10-26

### Added

#### Declarative DSLs (Phase 8.6)
- **Agent DSL**: Define agents with declarative syntax
  - `Normandy.DSL.Agent` - `agent do ...end` blocks for agent configuration
  - Macro-based configuration for model, temperature, prompts, tools
  - Automatic initialization with `new/1` and agent execution
  - Background, steps, and output_instructions directives

- **Workflow DSL**: Compose multi-agent workflows
  - `Normandy.DSL.Workflow` - `workflow do ... end` blocks
  - Sequential execution: `step :name do ... end`
  - Parallel execution: `parallel :name do ... end`
  - Race patterns: `race :name do ... end`
  - Data flow: `input(from: :step_name)` or static values
  - Result transformation: `transform fn ... end`
  - Conditional execution: `when_result do ... end`
  - Automatic step orchestration and error handling

- **Pattern Matching Helpers**: Utilities for result tuples
  - `Normandy.Coordination.Pattern` - Ergonomic {:ok, value} | {:error, reason} handling
  - Type checking: `ok?/1`, `error?/1`
  - Value extraction: `ok!/2`, `error!/2`, `unwrap!/1`
  - Filtering lists: `filter_ok/1`, `filter_errors/1`
  - Transformations: `map_ok/2`, `map_error/2`
  - Composition: `then/2`, `find_ok/1`, `collect_ok/1`, `all_ok/1`, `all_ok_map/1`
  - Wrapping utilities: `wrap/1`, `try_wrap/1`

- **Reactive Coordination Patterns**
  - `Normandy.Coordination.Reactive` - Concurrent agent execution primitives
  - `race/3` - Return first successful result from multiple agents
  - `all/3` - Wait for all agents with optional fail-fast mode
  - `some/4` - Quorum pattern (wait for N successful results)
  - `map/3` - Transform agent results
  - `when_result/3` - Conditional execution based on results

- **Agent Pool Management**
  - `Normandy.Coordination.AgentPool` - Connection pool pattern for agents
  - Transaction-based API with automatic checkout/checkin
  - Manual checkout/checkin for advanced use cases
  - Configurable pool size with overflow support
  - LIFO/FIFO checkout strategies
  - Automatic agent replacement on failure
  - Pool statistics and monitoring
  - Non-blocking checkout with timeout support

#### Core Foundation (Phases 1-7)
- **Schema System**: Macro-based DSL for defining agent I/O schemas with JSON Schema generation
  - `Normandy.Schema` module with `schema` and `io_schema` macros
  - Type system with casting, dumping, and loading via `Normandy.Type`
  - Changeset-like validation with `Normandy.Validate`
  - Support for parameterized and custom types

- **Agent System**: Core agent implementation with LLM integration
  - `Normandy.Agents.BaseAgent` with init, run, and get_response methods
  - `Normandy.Agents.BaseAgentConfig` for agent state management
  - Context provider system for dynamic prompt injection
  - Tool/function calling support via `Normandy.Agents.ToolCallResponse`

- **Memory Management**: Conversational history tracking
  - `Normandy.Components.AgentMemory` with turn-based organization
  - Message serialization and deserialization
  - Configurable message history limits

- **Prompt System**: Structured prompt generation
  - `Normandy.Components.SystemPromptGenerator` with section-based prompts
  - `Normandy.Components.PromptSpecification` for prompt structure
  - `Normandy.Components.ContextProvider` protocol for dynamic context

- **Streaming Responses**: Real-time LLM response streaming
  - Streaming support in `Normandy.Agents.BaseAgent`
  - Callback-based streaming with arity-2 callback support

- **Resilience Patterns**: Fault tolerance and reliability
  - `Normandy.Resilience.Retry` with exponential backoff
  - `Normandy.Resilience.CircuitBreaker` for preventing cascade failures
  - Integration with BaseAgent for automatic retry on failures

- **Context Window Management**: Intelligent conversation management
  - `Normandy.Context.WindowManager` for automatic context management
  - `Normandy.Context.TokenCounter` for accurate token counting
  - `Normandy.Context.Summarizer` for conversation summarization
  - Support for Claude's prompt caching (up to 90% cost reduction)

#### Multi-Agent Coordination (Phase 8)
- **Agent Communication**: Message-based agent-to-agent communication
  - `Normandy.Coordination.AgentMessage` for structured messaging
  - `Normandy.Coordination.SharedContext` for stateless context sharing
  - `Normandy.Coordination.StatefulContext` (GenServer + ETS) for stateful sharing

- **Orchestration Patterns**: Multiple coordination strategies
  - `Normandy.Coordination.SequentialOrchestrator` for pipeline execution
  - `Normandy.Coordination.ParallelOrchestrator` for concurrent execution
  - `Normandy.Coordination.HierarchicalCoordinator` for manager-worker patterns
  - Simple and advanced APIs for flexible usage

- **Agent Processes**: OTP-based agent supervision
  - `Normandy.Coordination.AgentProcess` (GenServer wrapper)
  - `Normandy.Coordination.AgentSupervisor` (DynamicSupervisor)
  - Fault tolerance with Elixir/OTP patterns

#### Batch Processing
- **Concurrent Processing**: Efficient batch agent execution
  - `Normandy.Batch.Processor` for concurrent batch processing
  - Configurable concurrency limits
  - Result aggregation and error handling

#### Integration & Testing (Phase 8.5)
- **Integration Tests**: Comprehensive real-world testing
  - 56 integration tests with real Anthropic API calls
  - Test helpers: `IntegrationHelper` and `NormandyIntegrationHelper`
  - Tag-based test exclusion (`@moduletag :api`, `@moduletag :integration`)
  - Coverage for multi-agent workflows, resilience, caching, and batch processing

- **LLM Client Integration**: Claudio HTTP client migration
  - Updated to Claudio v0.1.1 from hex.pm
  - Migrated from Tesla to Req HTTP client
  - Streaming error handling for `Req.Response.Async`

### Fixed
- Orchestrator APIs: Fixed `extract_result` to return full response maps instead of just chat_message strings
- Function clause matching: Improved pattern matching for simple vs advanced orchestrator APIs
- Streaming callbacks: Fixed arity-2 callback support for streaming responses

### Documentation
- Comprehensive README with usage examples
- Project roadmap (ROADMAP.md) tracking implementation phases
- MIT License
- Hex.pm package metadata and documentation configuration

### Dependencies
- `elixir_uuid` ~> 1.2 - UUID generation for conversation turns
- `poison` ~> 6.0 - JSON encoding/decoding
- `claudio` ~> 0.1.1 - Anthropic Claude API client
- `dialyxir` ~> 1.4 (dev/test) - Static analysis
- `stream_data` ~> 1.1 (dev/test) - Property-based testing
- `ex_doc` ~> 0.34 (dev) - Documentation generation

### Test Coverage
- 443 unit tests (29 doctests + 21 properties + 393 tests)
- 62 integration tests (56 API + 6 comprehensive DSL tests, excluded by default)
- Total: 505 tests, all passing
- New test files:
  - `test/coordination/pattern_test.exs` (13 tests)
  - `test/coordination/reactive_test.exs` (33 tests)
  - `test/coordination/agent_pool_test.exs` (30 tests)
  - `test/dsl/agent_test.exs` (8 tests)
  - `test/dsl/workflow_test.exs` (14 tests)
  - `test/dsl/workflow_transform_integration_test.exs` (4 tests)
  - `test/normandy_integration/dsl_comprehensive_test.exs` (6 comprehensive integration tests)

[1.0.0]: https://github.com/thetonymaster/normandy/releases/tag/v1.0.0
[0.9.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.9.0
[0.8.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.8.0
[0.7.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.7.0
[0.6.3]: https://github.com/thetonymaster/normandy/releases/tag/v0.6.3
[0.6.2]: https://github.com/thetonymaster/normandy/releases/tag/v0.6.2
[0.6.1]: https://github.com/thetonymaster/normandy/releases/tag/v0.6.1
[0.6.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.6.0
[0.5.1]: https://github.com/thetonymaster/normandy/releases/tag/v0.5.1
[0.5.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.5.0
[0.4.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.4.0
[0.3.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.3.0
[0.2.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.2.0
[0.1.0]: https://github.com/thetonymaster/normandy/releases/tag/v0.1.0