Skip to main content

CHANGELOG.md

# Changelog

All notable changes to AtpClient are documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to
[Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.4.0] - 2026-06-29

### Changed

- **`AtpClient.ResultNormalization.per_lemma_results/2` becomes `/3`** and
  now takes `(payload, lemma_specs, opts)` instead of `(payload, opts)`. The
  caller supplies one `%{name: String.t(), range: Range.t()}` spec per
  lemma (`AtpClient.Isabelle.lemma_specs/1` builds these from the theory
  body) and the classifier buckets messages by body-line range rather than
  re-deriving structure from `pos.line` alone. Returned `lemma_result()`
  maps lose the `:line` field — line numbers referred to the generated
  theory file, not the caller's source, and were ambiguous when a bucket
  contained multiple messages.
- **Per-lemma classifier no longer scans concatenated text across
  messages.** Each message is classified individually before the bucket is
  reconciled. This fixes two regressions visible with multi-lemma Isabelle
  jobs: (a) `Nitpick found a model` + `Nitpick found no counterexample`
  surfaced as `:csat` because the `"Nitpick found a" + "counterexample"`
  substring test fired across the message join, and (b) `by <tactic>`
  failing on a False goal was reported as `:thm` because Isabelle echoes
  `theorem name: <goal>` at the `by` position alongside the
  `Failed to finish proof` error. Verdict precedence is documented on
  `per_lemma_results/3`.
- **`per_lemma_results/3` accepts `:file`** to drop messages whose
  `pos.file` does not end with a given suffix. `prove_lemmas/4` uses this
  to filter out noise from the bundled `TPTP.thy` and other transitively
  imported theories that previously produced phantom lemma rows.
- **Lemma names always come through** on the per-lemma path — the name
  comes from the body (via `lemma_specs/1`), not from parsing `theorem
  name:` out of Isabelle messages. Sledgehammer / Nitpick verdicts no
  longer surface as `name: nil`.

### Added

- **`AtpClient.Isabelle.lemma_specs/1`** — extracts `[%{name, range}]` from
  a theory body. Used internally by `prove_lemmas/4` and `prove_tptp/3`;
  exposed for callers that want to pre-compute specs.

### Migration notes

- If you call `ResultNormalization.per_lemma_results(payload)` or
  `per_lemma_results(payload, line_offset: n)`, switch to
  `per_lemma_results(payload, specs, opts)` where `specs` come from
  `AtpClient.Isabelle.lemma_specs(body)`. The high-level
  `prove_lemmas/4` / `prove_tptp/3` / `query_tptp/2` entry points already
  do this for you.
- Drop reads of the `:line` field on `lemma_result` entries. Attribute by
  `:name` instead — line numbers were a generated-theory leak and have
  been removed from the returned shape.

## [0.3.0] - 2026-06-26

### Added

- **`AtpClient.LocalExec` backend.** Invokes a locally installed,
  TPTP-compliant prover binary (E, Vampire, …) via `System.cmd/3` and
  normalizes its stdout through the existing SZS classifier. Two-layered
  timeout: the prover-side CPU limit (passed via `:args`) lets provers
  emit a clean `SZS status Timeout`, and an independent BEAM-side
  wall-clock timeout (`:wall_timeout_ms`) kills wedged processes. Both
  paths fold into the same `{:ok, :timeout}` result so callers do not have
  to branch on the failure mode. Binary resolution goes through
  `System.find_executable/1`; missing binaries surface
  `{:error, {:prover_not_found, name}}` rather than raising.
- **`AtpClient.Backend` behaviour** so a UI (Smart Cell, Livebook, …) can
  enumerate and drive backends without hard-coding per-backend knowledge.
  Every backend (`SystemOnTptp`, `StarExec`, `Isabelle`, `LocalExec`) now
  implements `config_key/0`, `label/0`, `config_schema/0`, `verify/1`, and
  `query/2`. The new `query/2` is the cross-backend entry point: it takes
  a TPTP-format problem string plus a keyword list and returns a single
  `atp_result()`, hiding session login/logout, prover selection, theory
  bookkeeping, etc. Per-backend low-level entry points
  (`query_system/3`, `prove_theory/4`, `create_job/3`, `query_tptp/2`, …)
  remain available for callers that need sessions, multi-system fan-out,
  or per-lemma detail.
- **`AtpClient.Config.Field` struct** carrying the UI metadata each
  backend exposes via `config_schema/0`: logical `:type` (`:string |
  :integer | :boolean | :string_list`), layout `:group` (`:connection |
  :defaults | :advanced`), `:required?`, `:secret?`, `:default`, `:label`,
  and `:doc`. The library performs no coercion; values flow into
  `Application.put_env/3` and the per-call opts as-is.
- **`AtpClient.backends/0`** returns the list of behaviour-implementing
  backend modules so a UI can discover them without hard-coding.
- **TPTP-shaped entry points on `AtpClient.Isabelle`.**
  `query_tptp/2` and `prove_tptp/3` accept a TPTP/THF problem string,
  route it through `IsabelleClient.TPTP.isabellize_theory/1`, append the
  configured `:proof_method` (default `"by auto"`) to each generated
  `lemma`, and submit the resulting theory to Isabelle with
  `imports "TPTP"` and `unbundle from_TPTP` active. The bundled `TPTP.thy`
  support theory is copied into `:local_dir` on first use so Isabelle's
  loader can resolve the import. Lemma names carry over from TPTP formula
  names. Pass `proof_method: "by metis"`, `"sledgehammer"`, etc. for
  stronger or probing tactics. Smart-cell consumers can now drive Isabelle
  from the same TPTP editor they use for the SystemOnTPTP / StarExec /
  LocalExec backends; theory-text entry points (`prove_theory/4`,
  `prove_lemmas/4`, `query/3`, `query_lemmas/3`) remain unchanged.
- **`AtpClient.Isabelle.SessionOwner`** — a private GenServer that owns
  the link to `IsabelleClient.Shared`. `open_session/1` now returns
  `{:error, reason}` on a failed connection without killing
  non-trapping callers, a later Shared crash surfaces through a monitor
  rather than an `:EXIT`, and dropping the caller monitors the owner to a
  clean shutdown so server-side Isabelle sessions are no longer orphaned.
  The `Session` struct gains an opaque `:owner` field; treat it as
  internal.
- **`scripts/build_eprover.sh`** — builds the E theorem prover from source
  and installs it to `priv/bin/eprover` for use as the `:local_exec`
  backend's binary.
- **`AtpClient.ResultNormalization.failure_t/0` gains
  `{:prover_not_found, String.t()}`** for the `LocalExec` binary-resolution
  failure mode.

### Changed

- **Library defaults are merged per-key inside `AtpClient.Config.get/2`**
  instead of being seeded via `mix.exs`'s `:env` block. The previous
  arrangement let any `config :atp_client, :<backend>, …` in user
  `config.exs` silently replace the whole keyword list (the OTP
  `Application.put_env/3` semantics), dropping defaults the user had not
  explicitly re-set — most visibly causing a fresh install with partial
  SystemOnTPTP config to fail with `{:unrecognized_output, ""}` because
  `:url` had vanished. Defaults now live in `AtpClient.Config`'s
  `@defaults` (exposed via `Config.defaults/0`) and are merged underneath
  Application env on every read, so a partial user config only overrides
  the keys it actually names. No call-site changes required.
- **`AtpClient.SystemOnTptp.list_provers/0` blocks on the first call**
  until the startup refresh completes or `:sotptp, :refresh_timeout_ms`
  (default 15 s) elapses; subsequent calls return the cached list
  immediately. On timeout it returns `[]` rather than raising. Previously
  it could return `[]` immediately after application start, which the
  caller had no way to distinguish from "the SystemOnTPTP deployment is
  empty."
- **`per_lemma_results/2` now applies `check_tool_signals` per source line.**
  Sledgehammer's `"found a proof"` and Nitpick's `"found a counterexample"`/
  `"found a model"` verdicts surface as `{:ok, :thm}` / `{:ok, :csat}` /
  `{:ok, :sat}` on the per-lemma path, matching `interpret_isabelle_result/1`.
  Previously only Isabelle's `"theorem name:"` completion notification was
  recognised, so probe-style proofs (`sledgehammer nitpick oops`) collapsed
  to `:gave_up`. `{:ok, :timeout}` and `{:ok, :out_of_resources}` are now
  surfaced per lemma too. Lemma names attach to results derived from
  `theorem name:` completions; sledgehammer/nitpick verdicts carry
  `name: nil` because the underlying messages do not name the lemma.
- **`:isabelle_elixir` is now pinned to its GitHub `main` branch**
  (previously `~> 0.3` from Hex). The new TPTP entry points rely on the
  `IsabelleClient.TPTP` module, which is not in the 0.3.0 Hex release.
  Re-pin to a Hex constraint once a release containing `IsabelleClient.TPTP`
  ships.

### Breaking

- **`AtpClient.Isabelle.query/3` no longer defaults its `opts`
  argument.** Pass `[]` explicitly if you have no options to set. The
  default was removed so the three-arity `query/3` (theory text + name)
  does not collide with the `Backend` behaviour's `query/2` (TPTP problem
  string + opts), both of which now live on the same module.
- **`AtpClient.Isabelle.Session` gained an enforced `:owner` field.**
  Anyone constructing `%Session{client: …, config: …}` outside the
  library must now also pass `:owner` (the pid of an
  `AtpClient.Isabelle.SessionOwner`). Treat `Session` as opaque and
  construct it only via `open_session/1`.

### Migration notes

- If you call `Isabelle.query(theory, name)`, change it to
  `Isabelle.query(theory, name, [])`.
- If you read the `:env` block of `AtpClient.MixProject` to discover
  library defaults, call `AtpClient.Config.defaults/0` instead. The
  `:env` block has been removed since the defaults are no longer
  load-bearing there.
- If you construct `%AtpClient.Isabelle.Session{}` directly, stop —
  use `open_session/1` so the `:owner` pid is set up correctly.

## [0.2.0]

### Changed

- **Bumped `:isabelle_elixir` to `~> 0.2`.** The upstream library has moved to a
  task-based wire model; `AtpClient.Isabelle` has been rewritten on top of the
  new `IsabelleClient` (the stateful struct client) instead of
  `IsabelleClientMini`'s polling API. End users of the high-level functions
  (`open_session/1`, `prove_theory/4`, `query/3`, `close_session/1`) see the
  same surface — only the internals and a few error / raw-result shapes have
  changed.
- `AtpClient.Isabelle.Session` now wraps an `%IsabelleClient{}` rather than
  carrying `:socket` and `:session_id` directly. Code that pattern-matched
  `%Session{socket: socket, session_id: id}` must instead use
  `%Session{client: %IsabelleClient{socket: socket, session_id: id}}`.
- `:raw` mode of `prove_theory/4` and `query/3` now returns the `use_theories`
  result map directly (with top-level `"ok"`, `"errors"`, `"nodes"` keys)
  instead of the previous keyword list of poll messages.
  `AtpClient.ResultNormalization.extract_isabelle_text/1` already accepted this
  map shape, so call sites that used it continue to work unchanged.
- `AtpClient.ResultNormalization.interpret_isabelle_status/1` has been
  **renamed** to `interpret_isabelle_result/1` and now takes the `use_theories`
  result map directly. The keyword-list input form is no longer supported — the
  previous name and shape were leaks of the old polling abstraction.
- `AtpClient.ResultNormalization.extract_isabelle_text/1` no longer accepts a
  keyword list; only the result map is supported.
- `{:error, {:isabelle_failed, payload, status}}` errors now carry `payload`
  (the FAILED body) and `notes` (intermediate `NOTE` payloads accumulated by
  `IsabelleClient.Task`) instead of the old full poll-message keyword list. The
  "Cannot load theory file" annotation path continues to emit a 3-tuple
  `{:isabelle_failed, payload, [hint:, local_dir:, isabelle_dir:]}` as before.

### Removed

- The `:poll_interval_ms` Isabelle config setting is no longer used — the new
  client uses blocking `await_task` semantics rather than polling. Existing
  `config :atp_client, :isabelle, poll_interval_ms: ...` entries are removed.

### Migration notes

For most callers the upgrade is a no-op beyond updating `mix.exs`. Action is
only needed if you:

- Pattern-match on `AtpClient.Isabelle.Session` fields directly. Use
  `session.client.socket` / `session.client.session_id` instead, or treat
  `Session` as opaque.
- Call `AtpClient.ResultNormalization.interpret_isabelle_status/1`. Rename to
  `interpret_isabelle_result/1` and pass the `use_theories` payload map (or
  `task.result` of a finished `IsabelleClient.Task`) instead of the previous
  poll keyword list.
- Use `prove_theory(..., raw: true)` or `query(..., raw: true)` and consume the
  result with anything other than `extract_isabelle_text/1`. The shape is now a
  plain map; expect `payload["nodes"]`, `payload["ok"]`, etc.
- Match on the third element of `{:error, {:isabelle_failed, _, _}}` errors. It
  is now either a list of NOTE payload maps (plain failures) or a
  `[hint:, local_dir:, isabelle_dir:]` keyword list (annotated "Cannot load
  theory file" failures).

## [0.1.3]

Prior versions are not retroactively documented here. See git history for
changes before 0.2.0.