# Changelog
## Unreleased
Nothing yet.
## 1.3.3 - 2026-05-29
Calibration release for the v1.3.2 Elixir cutover.
**New:**
- Added a multi-audience README path map covering the operator-local
Familiar, ACP editor mounting, Phoenix embeds, eval/research work,
persistent characters, hosted service shapes, and multi-agent coordination.
Evidence: PR #125.
- Added `docs/acp-editor.md`, a worked guide for mounting the Familiar as an
ACP agent in editors, including Zed configuration, standalone JSON-RPC
smoke testing, diagnostics, and honest read-only scope. Evidence: PR #125.
- Added `evals/familiar/v1.3.3.exs`, a curated starter suite for Familiar eval
work covering gate use, composition, synthesis quality, forbidden-pattern
checks, and loom recall. Evidence: PR #125.
- Added a real-LLM Mnesia rehydration smoke test for the production Familiar
path: summon against a workspace root, record a turn, stop the process,
summon fresh against the same root-derived Mnesia table, and assert the
entity sees prior turns through `loom.turns`. Evidence: PR #124, issue #120.
**Changed:**
- The Familiar now defaults to the host-BEAM unrestricted evaluator for its
operator-local audience, while `sandbox: :port` remains available for
child-BEAM isolation. Explicit `sandbox: nil` with a `port_runner` still
selects the port path. Evidence: PRs #121 and #123, issue #115.
- Bash medium capability text now distinguishes shell state from filesystem
side effects instead of overstating persistence. Evidence: PR #123,
issue #117.
- Code-medium inhabitant guidance now describes the exact top-level binding
contract for `defmodule`: gate functions, `loom`, `folded_summary`, and
prior-turn variables are top-level bindings that module bodies cannot see.
Evidence: PR #125, issue #116.
- `Cantrip.cast_batch` guidance now says children start concurrently, bounded
by `max_concurrent_children`, and results are returned in request order
instead of making an unconditional "parallel" claim. Evidence: PR #125,
issue #118.
- The Spellbook loom ritual now verifies JSONL persistence, production
Familiar Mnesia rehydration, and folding as prompt projection over an
append-only loom. Evidence: PRs #124 and #125, issues #119 and #120.
**Verification:**
- The v1.3.2 inhabitant-affordance audit spawned fix issues #115-#120; all are
closed with code, docs, tests, or narrowed public contracts. The issues,
PRs, and changelog now carry the durable record.
- `mix verify`, `mix docs`, and PR CI passed on the final v1.3.3 batch.
- Open GitHub issues after the calibration queue are only explicitly deferred
future-work issues #108-#112.
## 1.3.2 - 2026-05-28
Package-coherence release for the Elixir cutover.
**New:**
- Added `docs/spellbook.md`, a vocabulary guide for cantrips, identities,
mediums, gates, wards, circles, looms, entities, and the Familiar. The
Spellbook is linked from the README, included in ExDoc, and shipped in the
Hex package. Evidence: PR #105, issue #103.
- Added inhabitant-voice opening paragraphs to the documented public modules
so the README, Spellbook, generated docs, and Familiar prompt describe the
same runtime concepts. Evidence: PR #105, issue #102.
- Conversation mediums now expose capability text that teaches the same
medium/gate/ward grammar used by code and Familiar flows, including the
conditional `done` ending. Evidence: PR #104, issue #96.
- The Familiar prompt now names the BEAM/codebase environment more directly:
`Code.fetch_docs/1`, `loom.turns`, workspace boundaries, and the Cantrip
bibliography are all part of the orientation. Evidence: PR #104, issue #97.
**Changed:**
- Removed stale migration/audit docs and dead compatibility code from the
pre-cutover era. The old material remains available through git history,
while the source tree now presents the Elixir package as canonical. Evidence:
PR #101, issues #98 and #99.
- Split long historical Zed trace replay behind
`RUN_REAL_LLM_TESTS=1 RUN_REAL_TRACE_REPLAY=1`. The ordinary real-LLM release
gate now covers stable live integration contracts; trace replay remains
available as an explicit stress/provenance check.
**Verification:**
- Fresh-install dogfood from the built Hex tar succeeded outside the repo:
package contents included `.env.example`, `README.md`, and
`docs/spellbook.md`; `mix deps.get`, `mix cantrip.cast "explain what a
cantrip is"`, and `mix cantrip.familiar "summarize the loom storage modules"`
all ran from the extracted package using local live LLM configuration.
- `RUN_REAL_LLM_TESTS=1` over the explicit stable live/real integration suite
passed: 20 tests, 0 failures, including a focused real-LLM JSONL loom
rehydration smoke. The trace replay suite is no longer part of that default
live gate.
- `mix verify`, `mix docs`, and `mix hex.build` pass with the package docs and
file list current.
## 1.3.1 - 2026-05-28
Patch release for runtime/safety findings surfaced immediately after the
`1.3.0` tag.
**Fixes:**
- Unknown code-medium sandbox ward values now fail closed with a structured
`code` error observation instead of falling through to host-BEAM
unrestricted eval. Regression coverage proves the submitted code does not
execute under an unsupported sandbox value. Evidence: issue #93.
- Observation arguments are now recursively redacted before they can be stored
on loom observations. Conversation tool-call args, malformed `args_raw`, and
port code-medium gate args are covered so secret-shaped values do not persist
through observation metadata while non-secret argument shape remains useful.
Evidence: issue #92.
## 1.3.0 - 2026-05-28
Post-v1.2 stabilization release. This drains the hardening work that landed
after `1.2.0` into a real source/package version, including the Bash sandbox
boundary change, runtime and persistence fixes, API surface cleanup, package
metadata fixes, and Familiar composition guidance.
**Breaking:**
- Bash-medium cantrips now require an OS sandbox and fail closed when neither
`bubblewrap` nor `sandbox-exec` is available. Declared gates are projected
into the shell as PATH commands and dispatch back through the parent BEAM;
raw shell remains the medium, but gate authority now comes from the circle
rather than ambient process access. The `done` gate is exposed as
`cantrip_done` because `done` is a shell keyword. Tests may opt into
`medium_opts: %{sandbox: :passthrough}`; production cannot.
- Bash sandbox verification now includes representative shell workloads
(`git`, `make`, `jq`, `/dev/null` redirects, and common
`find`/`sed`/`grep` pipelines). The workload suite is the support contract:
when a real shell workload should be supported, add it there so adapter
gaps fail in CI instead of surfacing in user sessions. Workload tests opt
into `%{bash_network: :on}` so GitHub-hosted Linux runners can exercise
bubblewrap shell behavior even when they cannot create bubblewrap's default
network-deny namespace; separate tests pin the default network-deny command
shape.
**New:**
- Familiar prompt/runtime evaluation now has a composition metric:
`child_medium_used` scores whether a child turn used the expected medium.
Turn metadata records `medium_type`, JSONL rehydration preserves it, and
the eval suite scores whether a Familiar child turn used the expected
medium for synthesis-shaped tasks. This is rubric coverage; behavioral
validation still requires real-LLM runs. Evidence: PR #90, issue #83.
- Default Familiar guidance now explicitly teaches answer-shape selection:
gather and compose in code, then delegate speech-shaped synthesis,
explanation, review, naming, judgment, decision, or voice to a
conversation child. Explicit user requests for a child, medium, or batch
shape are treated as directives unless impossible. Evidence: PR #90,
issue #83.
**Fixes:**
- Bash sandbox support now has representative shell workload coverage for
`git`, `make`, `jq`, `/dev/null`, and common `find`/`sed`/`grep` pipelines,
including the GitHub Actions runner network-namespace constraint. Evidence:
PR #84, issue #82.
- The Hex package now includes `.env.example`, matching the README quick
start. Package metadata tests assert README `cp` sources exist and ship in
the Hex file list. Evidence: PR #88, issue #85.
- The documented public API surface now matches generated docs: internal
modules are hidden, `docs/public-api.md` names the supported surface, nested
modules are checked from application metadata, and ExDoc warnings are errors.
Evidence: PR #89, issue #87.
- Provider and gate boundaries are typed more explicitly: LLM provider
responses flow through `%Cantrip.LLM.Response{}`, gate arguments are
normalized through per-gate DTOs, ACP `_meta` overrides are constrained, and
provider option/usage forwarding has regression coverage. Evidence: PRs
#57, #66, #76, and #77.
- Durable loom and JSONL behavior is stricter: append semantics align between
in-memory and durable paths, JSONL writes are serialized, persisted
code-state bindings are compacted, event upcasting is versioned, and
truncation/medium metadata rehydrate as atom keys. Evidence: PRs #66, #70,
#71, #74, and #90.
- Streaming and observability paths preserve context while staying bounded:
streaming emits real text deltas, ACP trace context is propagated, intent
telemetry is redacted, streaming delivery has backpressure, bridge delivery
uses bounded barriers, and early stream halt shuts down runner tasks.
Evidence: PRs #50, #58, and #75.
- Child composition is more disciplined: pre-built child casts compose parent
wards, declaration-time child-spawn wards are enforced, and the default
Familiar can read files through its normal observation gates. Evidence: PRs
#72, #73, and #78.
**CI / packaging:**
- GitHub Actions checkout was updated for the Node 24 runner environment.
Evidence: PR #81.
- The cleanup status ledger records the post-v1.2 hardening pass and the CI
gates that made it durable. Evidence: PR #80.
## 1.2.0
Post-v1 feature completion pass. The two feature-roadmap items left after
the `1.1.0` hardening release are now shipped and closed with proof.
**New:**
- Added a Familiar eval harness for prompt/runtime regression work:
multi-scenario and multi-seed runs, fixture workspaces, persisted JSONL
transcripts, JSON reports, rubric criteria, optional judge scoring, and
`mix cantrip.eval` CI thresholds. Evidence: `test/familiar_eval_test.exs`,
`test/mix_cantrip_eval_test.exs`, `docs/eval-harness.md`, PR #38.
- Added distributed Familiar support: root and child cantrips can target
named BEAM nodes through `:node`, remote casts preserve their node handle,
remote child observations are grafted into the parent loom, and
`Cantrip.Cluster` provides Mnesia extra-node/table-copy helpers for
replicated loom storage. Evidence: `test/distributed_cantrip_test.exs`,
`test/cluster_test.exs`, `docs/distributed-familiar.md`, PR #39.
**Fixes before tag:**
- Remote distributed calls now use bounded `:rpc.call/5` timeouts instead of
the distributed Erlang default of `:infinity`; unknown string node names fail
closed instead of silently falling back to local execution.
- `Cantrip.Cluster.connect_mnesia/2` now preserves Mnesia schema timeout
details so operators can see which table failed to synchronize.
## 1.1.0
Post-v1 hardening and cleanup pass. All cleanup issues from the v1 backlog
are closed with proof, including issues filed during the cleanup pass
(#32, #34, #35, #36, #37). See the cleanup-status tracker for the full ledger.
**Behavior change** worth flagging for downstream callers:
- `compile_and_load` now requires an explicit `allow_compile_modules`
allowlist; previously an empty allowlist was permissive. Deprecated
`allow_compile_namespaces` wards fail loudly instead of being silently
ignored. `Elixir.Cantrip.*` module names are rejected from hot-load
allowlists (except the explicit `Elixir.Cantrip.Hot.*` namespace).
**Fixes:**
- `EntityServer` no longer runs entity episodes inside the GenServer
mailbox. Episodes execute in a supervised per-entity runner task and
reply via `GenServer.reply/2`. Concurrent `send/2` while an episode is
running returns busy immediately. Code-medium port ownership survives
across persistent sends. Crash-restore preserves stream context.
- Malformed JSON in provider tool-call arguments now produces a structured
`is_error: true` observation rather than silently substituting `args: %{}`
and proceeding to (potentially) the wrong gate execution. Decode failure
carries `args_raw` + `args_decode_error` from adapter through the executor.
- Mnesia `ensure_schema/0` now propagates non-`already_exists` errors as
root-cause `init/1` failures; previously the catch-all `:ok` clause
hid filesystem and permission errors.
- Unknown medium types now fail validation with an explicit error and a
list of valid options rather than silently normalizing to `:conversation`.
- All `String.to_atom/1` paths from external strings are now bounded:
parent-context normalization uses a bounded allowlist; code-medium gate
bindings use `String.to_existing_atom/1`; loom JSONL restoration uses
existing atoms; Familiar table/node atoms use SHA-256 fingerprints.
- All three filesystem gates (`read_file`, `list_dir`, `search`) now route
through shared path validation consistently: missing root fails closed,
path traversal fails closed.
- Code-medium bare gate-call rewriting now parses with
`Code.string_to_quoted/1` and rewrites local gate-call AST nodes rather
than doing text-level rewrites. Strings, remote calls, already-dotted
calls, and definition heads are no longer subject to surprising rewrites.
- Safe boundary formatting wraps provider errors, JSONL persistence fallbacks,
port code-medium error surfaces, gate observations, ACP wire
stringification, and CLI output. Credential-shaped substrings are redacted
before crossing entity, disk, or protocol boundaries.
- `req_llm` 1.12 preserves multiple system messages through both Anthropic
and Gemini encoders; previously the v1.9 path could drop secondary
system messages.
- Familiar workspace cookie now fails loudly on invalid existing cookies
rather than silently regenerating; existing distributed connections are
no longer at risk of being broken on a malformed-cookie restart.
- The live real-LLM echo/done integration prompt now gives a stricter
two-step tool contract and descriptions so current Anthropic models
terminate with `done` instead of looping on `echo`.
**New:**
- Added a first-class `mix` gate for Familiars attached to Elixir workspaces.
It runs allowlisted Mix tasks under the configured root with argv as data,
bounded output, timeout handling, and structured observations. The Familiar
default allows `compile` and `format`; `test` is opt-in with `run_tests: true`
or an explicit `allow_mix_tasks` override.
- `Cantrip.Familiar.new/1` documented Dune-variant divergence in
`docs/port-isolated-runtime.md`. `sandbox: :dune` is now explicitly a
smaller-surface in-process variant of the code medium with different
bindings — entity prompts need to match the variant in use.
- `test/readme_examples_test.exs` pins the README/public-api quickstart
shapes; future drift between documented examples and the runtime
constructor signature fails CI.
- `docs/observability.md` is the canonical telemetry event registry
(subscription patterns, alert recommendations, trace correlation model);
implementation of the 9-item event checklist tracked on #11.
- `docs/cleanup-status.md` is the living tracker for the cleanup pass.
## 1.0.0
The first stable release. The Elixir implementation is the canonical
package surface; the runtime is documented and live-verified across
the Anthropic model tier (haiku, sonnet, opus).
Bug fixes surfaced during pre-tag live verification against real
Anthropic. All four shipped past `mix verify` green; all four needed
live driving to surface. Adds a v1 audit document and a live-integration
test module.
- Fixed: streaming responses dropped every tool call. The adapter consumed
the chunk stream via `tokens/1` + `Enum.reduce` for the realtime text
delta, then called `tool_calls/1` on the now-depleted stream and got
nothing. Switched to `ReqLLM.StreamResponse.process_stream/2`, the
documented public API for streaming tool-using agents.
- Fixed: persistent entities (`Cantrip.summon` + `Cantrip.send`) lost
every assistant turn across sends. The terminating branch of entity turn
execution never folded the final assistant message into `state.messages`.
The next send appended a user message to a history that still ended at the
prior user message; the model saw a stack of users with no record of its
own answers and anchored on the first prompt.
- Fixed: folding only preserved one leading `:system` message even though
initial message construction can emit two (identity + capability text).
On fold, the capability text dropped into the foldable body — over long
sessions the entity would silently lose its medium physics instructions.
- Upgraded `req_llm` from `~> 1.9` to `~> 1.12`. v1.12's
`agentjido/req_llm@9d790fd` removes the offending `intersperse` between
Anthropic system content blocks. With the upstream encoder fixed, the
local workaround introduced in c994878 was deleted.
- Added `test/live_anthropic_test.exs` covering code-medium sync,
code-medium streaming, and conversation-medium tool-calling. Gated on
`RUN_REAL_LLM_TESTS=1` via existing `Cantrip.Test.RealLLMEnv`.
- Added `docs/v1-audit.md` recording verified paths, uncertain paths,
and bugs found and fixed during the pre-tag audit.
## 1.0.0-rc.1
- Made the Elixir implementation the only canonical package surface.
- Removed the old spec/conformance scaffold and replaced unique coverage with
native ExUnit tests.
- Removed the compiled examples module and example Mix task; the notebook and
tests are the teaching surface.
- Removed hand-written OpenAI-compatible, Anthropic, and Gemini adapters.
Provider configuration now routes through ReqLLM via `Cantrip.LLM.from_env/1`.
- Removed DETS and Auto loom storage. Supported storage is memory, JSONL, and
Mnesia.
- Removed `call_entity` and `call_entity_batch` gates. Composition now uses
`Cantrip.new/1`, `Cantrip.cast/3`, and `Cantrip.cast_batch/2`.
- Removed the bare `read` gate. Use `read_file`, which validates paths against
the configured root.
- Reduced Mix task surface to `mix cantrip.cast` and `mix cantrip.familiar`.
- Made Familiar ACP the default ACP runtime.
- Made Familiar hot-loading opt-in with `evolve: true`.
- Replaced process/cutover docs with package docs: README, CONTRIBUTING,
DEPLOYMENT, architecture, signer-key runbook, and changelog.
- Added public API and v1 migration guides to the packaged ExDoc extras.
- Added the safe port code medium. `sandbox: :port` evaluates LLM-written
Elixir through Dune in a child BEAM process while gates, child cantrip API
calls, stdio, loom grafting, telemetry, provider access, and hot-load policy
stay in the parent.
- Added `port_runner` for launching that child through a deployment-provided
OS/container sandbox.
- Made the Familiar default to the safe port code medium. Raw child-BEAM
evaluation remains available as `sandbox: :port_unrestricted`; the old
host-BEAM evaluator remains available as `sandbox: :unrestricted` for
trusted local development.
- Added `docs/port-isolated-runtime.md` to document the implemented isolation
boundary and remaining deployment responsibilities.