Skip to main content

CHANGELOG.md

# Changelog

## Unreleased

Nothing yet.

## 1.3.3 - 2026-05-29

Calibration release for the v1.3.2 Elixir cutover.

**New:**

- Added a multi-audience README path map covering the operator-local
  Familiar, ACP editor mounting, Phoenix embeds, eval/research work,
  persistent characters, hosted service shapes, and multi-agent coordination.
  Evidence: PR #125.
- Added `docs/acp-editor.md`, a worked guide for mounting the Familiar as an
  ACP agent in editors, including Zed configuration, standalone JSON-RPC
  smoke testing, diagnostics, and honest read-only scope. Evidence: PR #125.
- Added `evals/familiar/v1.3.3.exs`, a curated starter suite for Familiar eval
  work covering gate use, composition, synthesis quality, forbidden-pattern
  checks, and loom recall. Evidence: PR #125.
- Added a real-LLM Mnesia rehydration smoke test for the production Familiar
  path: summon against a workspace root, record a turn, stop the process,
  summon fresh against the same root-derived Mnesia table, and assert the
  entity sees prior turns through `loom.turns`. Evidence: PR #124, issue #120.

**Changed:**

- The Familiar now defaults to the host-BEAM unrestricted evaluator for its
  operator-local audience, while `sandbox: :port` remains available for
  child-BEAM isolation. Explicit `sandbox: nil` with a `port_runner` still
  selects the port path. Evidence: PRs #121 and #123, issue #115.
- Bash medium capability text now distinguishes shell state from filesystem
  side effects instead of overstating persistence. Evidence: PR #123,
  issue #117.
- Code-medium inhabitant guidance now describes the exact top-level binding
  contract for `defmodule`: gate functions, `loom`, `folded_summary`, and
  prior-turn variables are top-level bindings that module bodies cannot see.
  Evidence: PR #125, issue #116.
- `Cantrip.cast_batch` guidance now says children start concurrently, bounded
  by `max_concurrent_children`, and results are returned in request order
  instead of making an unconditional "parallel" claim. Evidence: PR #125,
  issue #118.
- The Spellbook loom ritual now verifies JSONL persistence, production
  Familiar Mnesia rehydration, and folding as prompt projection over an
  append-only loom. Evidence: PRs #124 and #125, issues #119 and #120.

**Verification:**

- The v1.3.2 inhabitant-affordance audit spawned fix issues #115-#120; all are
  closed with code, docs, tests, or narrowed public contracts. The issues,
  PRs, and changelog now carry the durable record.
- `mix verify`, `mix docs`, and PR CI passed on the final v1.3.3 batch.
- Open GitHub issues after the calibration queue are only explicitly deferred
  future-work issues #108-#112.

## 1.3.2 - 2026-05-28

Package-coherence release for the Elixir cutover.

**New:**

- Added `docs/spellbook.md`, a vocabulary guide for cantrips, identities,
  mediums, gates, wards, circles, looms, entities, and the Familiar. The
  Spellbook is linked from the README, included in ExDoc, and shipped in the
  Hex package. Evidence: PR #105, issue #103.
- Added inhabitant-voice opening paragraphs to the documented public modules
  so the README, Spellbook, generated docs, and Familiar prompt describe the
  same runtime concepts. Evidence: PR #105, issue #102.
- Conversation mediums now expose capability text that teaches the same
  medium/gate/ward grammar used by code and Familiar flows, including the
  conditional `done` ending. Evidence: PR #104, issue #96.
- The Familiar prompt now names the BEAM/codebase environment more directly:
  `Code.fetch_docs/1`, `loom.turns`, workspace boundaries, and the Cantrip
  bibliography are all part of the orientation. Evidence: PR #104, issue #97.

**Changed:**

- Removed stale migration/audit docs and dead compatibility code from the
  pre-cutover era. The old material remains available through git history,
  while the source tree now presents the Elixir package as canonical. Evidence:
  PR #101, issues #98 and #99.
- Split long historical Zed trace replay behind
  `RUN_REAL_LLM_TESTS=1 RUN_REAL_TRACE_REPLAY=1`. The ordinary real-LLM release
  gate now covers stable live integration contracts; trace replay remains
  available as an explicit stress/provenance check.

**Verification:**

- Fresh-install dogfood from the built Hex tar succeeded outside the repo:
  package contents included `.env.example`, `README.md`, and
  `docs/spellbook.md`; `mix deps.get`, `mix cantrip.cast "explain what a
  cantrip is"`, and `mix cantrip.familiar "summarize the loom storage modules"`
  all ran from the extracted package using local live LLM configuration.
- `RUN_REAL_LLM_TESTS=1` over the explicit stable live/real integration suite
  passed: 20 tests, 0 failures, including a focused real-LLM JSONL loom
  rehydration smoke. The trace replay suite is no longer part of that default
  live gate.
- `mix verify`, `mix docs`, and `mix hex.build` pass with the package docs and
  file list current.

## 1.3.1 - 2026-05-28

Patch release for runtime/safety findings surfaced immediately after the
`1.3.0` tag.

**Fixes:**

- Unknown code-medium sandbox ward values now fail closed with a structured
  `code` error observation instead of falling through to host-BEAM
  unrestricted eval. Regression coverage proves the submitted code does not
  execute under an unsupported sandbox value. Evidence: issue #93.
- Observation arguments are now recursively redacted before they can be stored
  on loom observations. Conversation tool-call args, malformed `args_raw`, and
  port code-medium gate args are covered so secret-shaped values do not persist
  through observation metadata while non-secret argument shape remains useful.
  Evidence: issue #92.

## 1.3.0 - 2026-05-28

Post-v1.2 stabilization release. This drains the hardening work that landed
after `1.2.0` into a real source/package version, including the Bash sandbox
boundary change, runtime and persistence fixes, API surface cleanup, package
metadata fixes, and Familiar composition guidance.

**Breaking:**

- Bash-medium cantrips now require an OS sandbox and fail closed when neither
  `bubblewrap` nor `sandbox-exec` is available. Declared gates are projected
  into the shell as PATH commands and dispatch back through the parent BEAM;
  raw shell remains the medium, but gate authority now comes from the circle
  rather than ambient process access. The `done` gate is exposed as
  `cantrip_done` because `done` is a shell keyword. Tests may opt into
  `medium_opts: %{sandbox: :passthrough}`; production cannot.
- Bash sandbox verification now includes representative shell workloads
  (`git`, `make`, `jq`, `/dev/null` redirects, and common
  `find`/`sed`/`grep` pipelines). The workload suite is the support contract:
  when a real shell workload should be supported, add it there so adapter
  gaps fail in CI instead of surfacing in user sessions. Workload tests opt
  into `%{bash_network: :on}` so GitHub-hosted Linux runners can exercise
  bubblewrap shell behavior even when they cannot create bubblewrap's default
  network-deny namespace; separate tests pin the default network-deny command
  shape.

**New:**

- Familiar prompt/runtime evaluation now has a composition metric:
  `child_medium_used` scores whether a child turn used the expected medium.
  Turn metadata records `medium_type`, JSONL rehydration preserves it, and
  the eval suite scores whether a Familiar child turn used the expected
  medium for synthesis-shaped tasks. This is rubric coverage; behavioral
  validation still requires real-LLM runs. Evidence: PR #90, issue #83.
- Default Familiar guidance now explicitly teaches answer-shape selection:
  gather and compose in code, then delegate speech-shaped synthesis,
  explanation, review, naming, judgment, decision, or voice to a
  conversation child. Explicit user requests for a child, medium, or batch
  shape are treated as directives unless impossible. Evidence: PR #90,
  issue #83.

**Fixes:**

- Bash sandbox support now has representative shell workload coverage for
  `git`, `make`, `jq`, `/dev/null`, and common `find`/`sed`/`grep` pipelines,
  including the GitHub Actions runner network-namespace constraint. Evidence:
  PR #84, issue #82.
- The Hex package now includes `.env.example`, matching the README quick
  start. Package metadata tests assert README `cp` sources exist and ship in
  the Hex file list. Evidence: PR #88, issue #85.
- The documented public API surface now matches generated docs: internal
  modules are hidden, `docs/public-api.md` names the supported surface, nested
  modules are checked from application metadata, and ExDoc warnings are errors.
  Evidence: PR #89, issue #87.
- Provider and gate boundaries are typed more explicitly: LLM provider
  responses flow through `%Cantrip.LLM.Response{}`, gate arguments are
  normalized through per-gate DTOs, ACP `_meta` overrides are constrained, and
  provider option/usage forwarding has regression coverage. Evidence: PRs
  #57, #66, #76, and #77.
- Durable loom and JSONL behavior is stricter: append semantics align between
  in-memory and durable paths, JSONL writes are serialized, persisted
  code-state bindings are compacted, event upcasting is versioned, and
  truncation/medium metadata rehydrate as atom keys. Evidence: PRs #66, #70,
  #71, #74, and #90.
- Streaming and observability paths preserve context while staying bounded:
  streaming emits real text deltas, ACP trace context is propagated, intent
  telemetry is redacted, streaming delivery has backpressure, bridge delivery
  uses bounded barriers, and early stream halt shuts down runner tasks.
  Evidence: PRs #50, #58, and #75.
- Child composition is more disciplined: pre-built child casts compose parent
  wards, declaration-time child-spawn wards are enforced, and the default
  Familiar can read files through its normal observation gates. Evidence: PRs
  #72, #73, and #78.

**CI / packaging:**

- GitHub Actions checkout was updated for the Node 24 runner environment.
  Evidence: PR #81.
- The cleanup status ledger records the post-v1.2 hardening pass and the CI
  gates that made it durable. Evidence: PR #80.

## 1.2.0

Post-v1 feature completion pass. The two feature-roadmap items left after
the `1.1.0` hardening release are now shipped and closed with proof.

**New:**

- Added a Familiar eval harness for prompt/runtime regression work:
  multi-scenario and multi-seed runs, fixture workspaces, persisted JSONL
  transcripts, JSON reports, rubric criteria, optional judge scoring, and
  `mix cantrip.eval` CI thresholds. Evidence: `test/familiar_eval_test.exs`,
  `test/mix_cantrip_eval_test.exs`, `docs/eval-harness.md`, PR #38.
- Added distributed Familiar support: root and child cantrips can target
  named BEAM nodes through `:node`, remote casts preserve their node handle,
  remote child observations are grafted into the parent loom, and
  `Cantrip.Cluster` provides Mnesia extra-node/table-copy helpers for
  replicated loom storage. Evidence: `test/distributed_cantrip_test.exs`,
  `test/cluster_test.exs`, `docs/distributed-familiar.md`, PR #39.

**Fixes before tag:**

- Remote distributed calls now use bounded `:rpc.call/5` timeouts instead of
  the distributed Erlang default of `:infinity`; unknown string node names fail
  closed instead of silently falling back to local execution.
- `Cantrip.Cluster.connect_mnesia/2` now preserves Mnesia schema timeout
  details so operators can see which table failed to synchronize.

## 1.1.0

Post-v1 hardening and cleanup pass. All cleanup issues from the v1 backlog
are closed with proof, including issues filed during the cleanup pass
(#32, #34, #35, #36, #37). See the cleanup-status tracker for the full ledger.

**Behavior change** worth flagging for downstream callers:

- `compile_and_load` now requires an explicit `allow_compile_modules`
  allowlist; previously an empty allowlist was permissive. Deprecated
  `allow_compile_namespaces` wards fail loudly instead of being silently
  ignored. `Elixir.Cantrip.*` module names are rejected from hot-load
  allowlists (except the explicit `Elixir.Cantrip.Hot.*` namespace).

**Fixes:**

- `EntityServer` no longer runs entity episodes inside the GenServer
  mailbox. Episodes execute in a supervised per-entity runner task and
  reply via `GenServer.reply/2`. Concurrent `send/2` while an episode is
  running returns busy immediately. Code-medium port ownership survives
  across persistent sends. Crash-restore preserves stream context.
- Malformed JSON in provider tool-call arguments now produces a structured
  `is_error: true` observation rather than silently substituting `args: %{}`
  and proceeding to (potentially) the wrong gate execution. Decode failure
  carries `args_raw` + `args_decode_error` from adapter through the executor.
- Mnesia `ensure_schema/0` now propagates non-`already_exists` errors as
  root-cause `init/1` failures; previously the catch-all `:ok` clause
  hid filesystem and permission errors.
- Unknown medium types now fail validation with an explicit error and a
  list of valid options rather than silently normalizing to `:conversation`.
- All `String.to_atom/1` paths from external strings are now bounded:
  parent-context normalization uses a bounded allowlist; code-medium gate
  bindings use `String.to_existing_atom/1`; loom JSONL restoration uses
  existing atoms; Familiar table/node atoms use SHA-256 fingerprints.
- All three filesystem gates (`read_file`, `list_dir`, `search`) now route
  through shared path validation consistently: missing root fails closed,
  path traversal fails closed.
- Code-medium bare gate-call rewriting now parses with
  `Code.string_to_quoted/1` and rewrites local gate-call AST nodes rather
  than doing text-level rewrites. Strings, remote calls, already-dotted
  calls, and definition heads are no longer subject to surprising rewrites.
- Safe boundary formatting wraps provider errors, JSONL persistence fallbacks,
  port code-medium error surfaces, gate observations, ACP wire
  stringification, and CLI output. Credential-shaped substrings are redacted
  before crossing entity, disk, or protocol boundaries.
- `req_llm` 1.12 preserves multiple system messages through both Anthropic
  and Gemini encoders; previously the v1.9 path could drop secondary
  system messages.
- Familiar workspace cookie now fails loudly on invalid existing cookies
  rather than silently regenerating; existing distributed connections are
  no longer at risk of being broken on a malformed-cookie restart.
- The live real-LLM echo/done integration prompt now gives a stricter
  two-step tool contract and descriptions so current Anthropic models
  terminate with `done` instead of looping on `echo`.

**New:**

- Added a first-class `mix` gate for Familiars attached to Elixir workspaces.
  It runs allowlisted Mix tasks under the configured root with argv as data,
  bounded output, timeout handling, and structured observations. The Familiar
  default allows `compile` and `format`; `test` is opt-in with `run_tests: true`
  or an explicit `allow_mix_tasks` override.
- `Cantrip.Familiar.new/1` documented Dune-variant divergence in
  `docs/port-isolated-runtime.md`. `sandbox: :dune` is now explicitly a
  smaller-surface in-process variant of the code medium with different
  bindings — entity prompts need to match the variant in use.
- `test/readme_examples_test.exs` pins the README/public-api quickstart
  shapes; future drift between documented examples and the runtime
  constructor signature fails CI.
- `docs/observability.md` is the canonical telemetry event registry
  (subscription patterns, alert recommendations, trace correlation model);
  implementation of the 9-item event checklist tracked on #11.
- `docs/cleanup-status.md` is the living tracker for the cleanup pass.

## 1.0.0

The first stable release. The Elixir implementation is the canonical
package surface; the runtime is documented and live-verified across
the Anthropic model tier (haiku, sonnet, opus).

Bug fixes surfaced during pre-tag live verification against real
Anthropic. All four shipped past `mix verify` green; all four needed
live driving to surface. Adds a v1 audit document and a live-integration
test module.

- Fixed: streaming responses dropped every tool call. The adapter consumed
  the chunk stream via `tokens/1` + `Enum.reduce` for the realtime text
  delta, then called `tool_calls/1` on the now-depleted stream and got
  nothing. Switched to `ReqLLM.StreamResponse.process_stream/2`, the
  documented public API for streaming tool-using agents.
- Fixed: persistent entities (`Cantrip.summon` + `Cantrip.send`) lost
  every assistant turn across sends. The terminating branch of entity turn
  execution never folded the final assistant message into `state.messages`.
  The next send appended a user message to a history that still ended at the
  prior user message; the model saw a stack of users with no record of its
  own answers and anchored on the first prompt.
- Fixed: folding only preserved one leading `:system` message even though
  initial message construction can emit two (identity + capability text).
  On fold, the capability text dropped into the foldable body — over long
  sessions the entity would silently lose its medium physics instructions.
- Upgraded `req_llm` from `~> 1.9` to `~> 1.12`. v1.12's
  `agentjido/req_llm@9d790fd` removes the offending `intersperse` between
  Anthropic system content blocks. With the upstream encoder fixed, the
  local workaround introduced in c994878 was deleted.
- Added `test/live_anthropic_test.exs` covering code-medium sync,
  code-medium streaming, and conversation-medium tool-calling. Gated on
  `RUN_REAL_LLM_TESTS=1` via existing `Cantrip.Test.RealLLMEnv`.
- Added `docs/v1-audit.md` recording verified paths, uncertain paths,
  and bugs found and fixed during the pre-tag audit.

## 1.0.0-rc.1

- Made the Elixir implementation the only canonical package surface.
- Removed the old spec/conformance scaffold and replaced unique coverage with
  native ExUnit tests.
- Removed the compiled examples module and example Mix task; the notebook and
  tests are the teaching surface.
- Removed hand-written OpenAI-compatible, Anthropic, and Gemini adapters.
  Provider configuration now routes through ReqLLM via `Cantrip.LLM.from_env/1`.
- Removed DETS and Auto loom storage. Supported storage is memory, JSONL, and
  Mnesia.
- Removed `call_entity` and `call_entity_batch` gates. Composition now uses
  `Cantrip.new/1`, `Cantrip.cast/3`, and `Cantrip.cast_batch/2`.
- Removed the bare `read` gate. Use `read_file`, which validates paths against
  the configured root.
- Reduced Mix task surface to `mix cantrip.cast` and `mix cantrip.familiar`.
- Made Familiar ACP the default ACP runtime.
- Made Familiar hot-loading opt-in with `evolve: true`.
- Replaced process/cutover docs with package docs: README, CONTRIBUTING,
  DEPLOYMENT, architecture, signer-key runbook, and changelog.
- Added public API and v1 migration guides to the packaged ExDoc extras.
- Added the safe port code medium. `sandbox: :port` evaluates LLM-written
  Elixir through Dune in a child BEAM process while gates, child cantrip API
  calls, stdio, loom grafting, telemetry, provider access, and hot-load policy
  stay in the parent.
- Added `port_runner` for launching that child through a deployment-provided
  OS/container sandbox.
- Made the Familiar default to the safe port code medium. Raw child-BEAM
  evaluation remains available as `sandbox: :port_unrestricted`; the old
  host-BEAM evaluator remains available as `sandbox: :unrestricted` for
  trusted local development.
- Added `docs/port-isolated-runtime.md` to document the implemented isolation
  boundary and remaining deployment responsibilities.