Skip to main content

CHANGELOG.md

# Changelog

All notable changes to CouncilEx are documented here.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.1.0] - 2026-05-31

First public release. CouncilEx is a framework for building multi-model,
multi-agent LLM council workflows in Elixir: define a council of members,
run structured rounds, aggregate or judge their outputs, and synthesize a
final answer. Provider-agnostic, OTP-supervised, single-BEAM by default.

### Added

#### Councils & runs

- **Static & dynamic councils** — declare councils with the `use CouncilEx`
  DSL, or build them as data via `%CouncilEx.DynamicCouncil{}` (pipeable
  builder, JSON ser/de, registry-by-string-name).
- **Polymorphic dispatch**`CouncilEx.run/3`, `start/3`, and `start_link/3`
  accept either a module-form council or a `%DynamicCouncil{}`; one execution
  path, identical semantics.
- **Sync + async runs** — blocking `run/3` for short workflows; `start/3`
  (`GenServer.start/3` semantics: unsupervised, unlinked) and `start_link/3`
  (linked) for async, both returning `{:ok, pid}`. Communicate with the
  runner via message passing.
- **Pre-run validation**`CouncilEx.validate/1` returns structured
  `[%{path, code, message}]` errors for module-form *and* `%DynamicCouncil{}`
  councils. `start/3` gates on it, so config errors return
  `{:error, {:invalid_council, errs}}` before any process spawns or token is
  spent.
- **Optional run grouping**`CouncilEx.Supervisor`, a thin
  `DynamicSupervisor` wrapper for tenant isolation, bulk-terminate, and
  in-flight visibility. Runs are unsupervised by default (caller-owned pids).
- **Failure handling** — per-round `failure_mode: :continue | :fail_fast`,
  retry policies, member timeouts, run-level `cancel/1`, and structured
  `%CouncilEx.Error{}`.
- **Sub-councils** — nest a council as a member (registered name, module
  atom, or nested `%DynamicCouncil{}`) with optional input mappers.
- **Routers** — dynamic next-step selection between members or rounds,
  declared inline or registered by name.
- **Registry** — config + runtime registration of profiles, tools, schemas,
  routers, rounds, sub-councils, and input mappers, all resolvable by string
  name.

#### Rounds, councils & aggregators

- **Round library**`IndependentAnalysis`, `Critique`, `Vote`, `Synthesis`,
  `WeightedSynthesis`, `Iterate`, `Ranking`, `PairwiseElimination`,
  `PeerReview`, `AnonymizedPeerReview`, plus a custom-round behaviour.
- **Built-in councils**`ParallelPanel`, `PeerReview`, `Voting`,
  `Specialist`, `Consensus`, `Tournament`, `Chairman`, `WeightedConsensus`,
  `JuryWithRetry`.
- **Confidence-triggered retry**`Councils.JuryWithRetry` runs K judges in
  parallel and re-samples on low average confidence (default threshold `0.7`,
  max `2` iterations). Judges do not see each other across retries —
  independent re-sample, not debate (respects Wu et al. *Can LLM Agents Really
  Debate?*, arXiv:2511.07784).
- **Reliability-weighted consensus**`Councils.WeightedConsensus` weights
  members by static `:weight`, per-member `:confidence`, or historical
  `Reliability` lookups. Inspired by Wu et al. *Council Mode*
  (arXiv:2604.02923); mapping in [`docs/COUNCIL_MODE_PAPER.md`](docs/COUNCIL_MODE_PAPER.md).
- **Aggregators**`Plurality`, `Borda`, `Condorcet`, `WeightedMean`,
  `Median`, `PeerRanking`.

#### AutoCouncil

- **AutoCouncil** — opt-in routing layer: a council that picks itself.
  Pluggable strategies (`:rules`, `:cascade`, plus stub `:embedding` /
  `:llm_classify` / `:llm_build`) select an existing council per prompt or
  synthesize a fresh `%DynamicCouncil{}`. Same `CouncilEx.run/3` entry;
  routing decision surfaced in `result.metadata.auto`.

#### Per-member capabilities

- **Profiles** — reusable per-member capability bundles (provider, model,
  temperature, tools, retry); 9 prebaked profiles plus user-defined
  `use CouncilEx.Profile` modules.
- **Structured output** — Ecto-schema or inline JSON Schema per member, with
  native `responseSchema` (Gemini) and tool-shaped fallback (OpenAI/Anthropic).
- **Streaming** — token-level streaming with sink callbacks, integrated with
  the tool-loop so tool-spanning turns read as one continuous response.
- **Tool calling** — parallel tool execution with concurrency + timeout knobs,
  multi-iteration tool-loops in both `complete/2` and `stream/3`, and
  `:tool_choice` (`:auto | :required | :none | "name"`).
- **RAG via tools** — council-level `add_council_tool/2` shares a toolset
  across members; per-member `:tools` keeps specialist corpora private.
  `CouncilEx.Tools.InMemoryDocs` is a zero-dep BM25 retrieval tool. See
  [`docs/RAG.md`](docs/RAG.md).
- **Per-member confidence** — opt-in `:confidence` strategies (`:self_report`,
  `:logprob`) populate `%MemberResult{}.confidence` for downstream weighting.

#### Providers

- **Built-in adapters** — OpenAI, Anthropic, Gemini, and OpenRouter implement
  the `CouncilEx.Provider.Adapter` behaviour; Ollama ships as a config preset
  over the OpenAI adapter.
- **Pluggable providers** — implement `CouncilEx.Provider.Adapter` for any
  backend.
- **Mock provider** — scriptable in-memory provider
  (`CouncilEx.Providers.Mock.script/2`) for tests and example fixtures; not
  for production.

#### Reliability & bias

- **Reliability store**`CouncilEx.Reliability` (ETS default, pluggable)
  tracks per-member historical accuracy by query features; feeds
  `WeightedConsensus`.
- **BiasDetector** — diagnostic-only `CouncilEx.BiasDetector.analyze/2` flags
  when member disagreement correlates with demographic axes. Lexicon backend
  in core.

#### Persistence (optional)

- **Optional backends**`*.Ecto` backends for `Reliability`, `Registry`, and
  `Recorder`, plus `*.Redis` for `Reliability` and `Registry`.
- **Oban worker**`CouncilEx.Workers.Oban` for running councils as
  background jobs. See [`docs/RUNNING_WITH_OBAN.md`](docs/RUNNING_WITH_OBAN.md).
- **Config `:mode` knob**`:single_node` / `:multi_node` flips all backends
  in one place; migration + recovery helpers under `CouncilEx.Persistence`.

#### Observability

- **PubSub events** — 10 frozen events on `"council_ex:run:#{run_id}"`
  (`CouncilEx.Events`); idempotent subscribe across `:pg` and Phoenix.PubSub
  adapters.
- **Telemetry**`[:council_ex, :run | :round | :member | :tool, :*]` events
  with full parity on the async path (~3µs/event overhead), enriched with
  per-member token, model, provider, round, and confidence metadata.
- **Verbose tracer**`verbose: true | :debug` prints a human-readable
  per-run timeline; pure event consumer, zero production cost when off.
- **Diagram tooling**`CouncilEx.Diagram.{to_ir, topology, sequence}` for
  both council shapes (ASCII / Mermaid / sequence), plus the
  `mix council.diagram` task. IR is React-Flow-friendly JSON.

#### Testing

- **`CouncilEx.Test`** — helpers for asserting on runs and capturing
  emitted events. See [`docs/UNIT_TESTING.md`](docs/UNIT_TESTING.md) and
  [`docs/TESTING.md`](docs/TESTING.md).

[Unreleased]: https://github.com/brewingelixir/council_ex/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/brewingelixir/council_ex/releases/tag/v0.1.0