CHANGELOG.md

Select File
# Changelog

All notable changes to AtpClient are documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to
[Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.5.0] - 2026-06-30

### Changed

- **`AtpClient.ResultNormalization.atp_result()` now surfaces the full SZS
  Ontology.** `{:ok, status}` carries the SZS verdict as a downcased atom
  rather than one of six collapsed atoms that lost information. The
  recognised vocabulary covers every Success and NoSuccess atom a
  sledgehammer-targeted prover (or a contemporary TPTP-compliant
  deployment) is likely to emit:

      Success:    :theorem, :unsatisfiable, :satisfiable, :counter_satisfiable,
                  :contradictory_axioms, :equivalent, :counter_equivalent,
                  :counter_theorem, :equivalent_counter_theorem, :equi_satisfiable,
                  :tautology, :tautologous_conclusion, :weaker_conclusion,
                  :no_consequence
      NoSuccess:  :gave_up, :unknown, :incomplete, :timeout, :resource_out,
                  :memory_out, :forced, :user, :inappropriate, :error, :input_error

  The most consequential distinctions the previous shape collapsed are
  `:theorem` ("conjecture follows from the axioms", e.g. Isabelle and
  model-checker endorsement) vs `:unsatisfiable` ("the negated-conjecture
  clause set has no model", e.g. TPTP refutation provers), and
  `:satisfiable` ("a model of the premises was found") vs
  `:counter_satisfiable` ("a model of the negated conjecture was found").
  See <https://tptp.org/Seminars/SZSOntologies/Summary.html> for the full
  ontology.
- **Prover-specific patterns are reclassified to honest SZS atoms** rather
  than sledgehammer's negated-conjecture reading. SPASS "Completion found"
  becomes `:satisfiable` (was `:gave_up`); iProver "CNFRefutation" becomes
  `:unsatisfiable` (was `:theorem`); Vampire "Satisfiability detected" and
  "Termination reason: Satisfiable" become `:satisfiable` (were `:csat`);
  Alt-Ergo "Unknown" becomes `:unknown` (was `:gave_up`); Vampire SIGINT
  becomes `:forced` (was `:interrupted`). Callers running a
  negated-conjecture refutation pipeline should treat `:satisfiable` from
  these provers as a counter-model to the original goal.
- **SPASS / Waldmeister input-rejection patterns move from
  `{:error, :malformed_input}` to `{:ok, :input_error}`.** SPASS "Undefined
  symbol", "Free Variable", "No formulae and clauses found in input file",
  and Waldmeister "Unexpected end of file" are the prover's own SZS
  InputError verdict — the prover ran, read the input, and rejected it.
  SPASS "Please report this error" and Waldmeister "Unrecoverable
  Segmentation Fault" remain `{:error, :internal_error}` because those are
  prover crashes, not verdicts. The SZS `Inappropriate` status likewise
  moved from `{:error, :malformed_input}` to `{:ok, :inappropriate}`.
- **`req` bumped to `~> 0.6`.** Resolves `GHSA-655f-mp8p-96gv`
  (decompression-bomb DoS via auto-decoded archive / compressed response
  bodies, HIGH) and `GHSA-px9f-whj3-246m` (multipart form-data header
  injection, LOW). No API changes at the AtpClient surface.

### Added

- **`t:szs_status/0`, `t:szs_success/0`, `t:szs_no_success/0`** typespecs
  on `AtpClient.ResultNormalization` enumerate the SZS atoms recognised
  explicitly. `szs_status` widens to `atom()` to admit the permissive
  fallback below.
- **Permissive SZS fallback.** Any unrecognised
  `% SZS status <CamelCase>` (or `… says <CamelCase>`) line passes through
  as its snake_case atom — so SZS additions like `EquivalentTheorem`
  become `{:ok, :equivalent_theorem}` without a code change. Bounded
  length and strict CamelCase keep the atom table safe against pathological
  input.
- **Word-boundary SZS line extraction.** Replaces substring matching, so
  longer names no longer collapse onto shorter prefixes:
  `EquivalentTheorem` ≠ `Equivalent`, `CounterSatisfiable` ≠ `Satisfiable`,
  `InputError` ≠ `Error`.

### Breaking

- The collapsed `t:success_t/0` atoms (`:thm`, `:csat`, `:sat`,
  `:out_of_resources`, `:interrupted`) are gone from
  `AtpClient.ResultNormalization.atp_result()` and from
  `AtpClient.Isabelle` (which produced `:thm` / `:csat` / `:sat` /
  `:timeout` / `:out_of_resources` / `:gave_up` from the Isabelle message
  scanner). Pattern matches must be rewritten — see the migration notes.
- The `{:error, :malformed_input}` failure path is gone. SPASS /
  Waldmeister input-rejection patterns and the SZS `Inappropriate` status
  surface in the `{:ok, _}` channel as `:input_error` / `:inappropriate`
  respectively. `failure_t()` no longer lists `:malformed_input`.

### Migration notes

A given old atom can map to one of several SZS atoms depending on which
prover family produced the result. Match the disjunction to preserve
coverage:

- `:thm` → `:theorem` (Isabelle / Sledgehammer / Alt-Ergo Valid) **or**
  `:unsatisfiable` (TPTP refutation provers, iProver CNFRefutation).
- `:csat` → `:counter_satisfiable` (SZS CounterSatisfiable, Isabelle /
  Nitpick counter-example) **or** `:satisfiable` (SZS Satisfiable,
  Isabelle / Nitpick model, Vampire "Satisfiability detected", SPASS
  "Completion found").
- `:sat` → `:satisfiable` (only emitted by the Isabelle classifier for
  Nitpick / Quickcheck models).
- `:out_of_resources` → `:resource_out` (SZS ResourceOut, SPASS / Waldmeister
  resource patterns) **or** `:memory_out` (SZS MemoryOut, Isabelle "Out of
  memory").
- `:interrupted` → `:forced` (SZS Forced, Vampire SIGINT) **or** `:user`
  (SZS User).
- `:gave_up` against an Alt-Ergo verdict → `:unknown`; against SZS
  `Incomplete` → `:incomplete`; otherwise still `:gave_up`.
- `{:error, :malformed_input}` → match `{:ok, :input_error}` instead.

So a previously written TPTP triage like

```elixir
case AtpClient.SystemOnTptp.query(problem, default_system: "cvc5---1.3.0") do
  {:ok, :thm}                       -> :proved
  {:ok, :csat}                      -> :disproved
  {:ok, atom} when atom in [:timeout, :out_of_resources, :gave_up, :interrupted]
                                    -> :inconclusive
  {:error, :malformed_input}        -> :bad_input
  {:error, _}                       -> :error
end
```

becomes

```elixir
case AtpClient.SystemOnTptp.query(problem, default_system: "cvc5---1.3.0") do
  {:ok, status} when status in [:theorem, :unsatisfiable]
                                    -> :proved
  {:ok, status} when status in [:satisfiable, :counter_satisfiable]
                                    -> :disproved
  {:ok, status} when status in [:timeout, :resource_out, :memory_out,
                                :gave_up, :unknown, :incomplete,
                                :forced, :user, :inappropriate]
                                    -> :inconclusive
  {:ok, :input_error}               -> :bad_input
  {:error, _}                       -> :error
end
```

## [0.4.0] - 2026-06-29

### Changed

- **`AtpClient.ResultNormalization.per_lemma_results/2` becomes `/3`** and
  now takes `(payload, lemma_specs, opts)` instead of `(payload, opts)`. The
  caller supplies one `%{name: String.t(), range: Range.t()}` spec per
  lemma (`AtpClient.Isabelle.lemma_specs/1` builds these from the theory
  body) and the classifier buckets messages by body-line range rather than
  re-deriving structure from `pos.line` alone. Returned `lemma_result()`
  maps lose the `:line` field — line numbers referred to the generated
  theory file, not the caller's source, and were ambiguous when a bucket
  contained multiple messages.
- **Per-lemma classifier no longer scans concatenated text across
  messages.** Each message is classified individually before the bucket is
  reconciled. This fixes two regressions visible with multi-lemma Isabelle
  jobs: (a) `Nitpick found a model` + `Nitpick found no counterexample`
  surfaced as `:csat` because the `"Nitpick found a" + "counterexample"`
  substring test fired across the message join, and (b) `by <tactic>`
  failing on a False goal was reported as `:thm` because Isabelle echoes
  `theorem name: <goal>` at the `by` position alongside the
  `Failed to finish proof` error. Verdict precedence is documented on
  `per_lemma_results/3`.
- **`per_lemma_results/3` accepts `:file`** to drop messages whose
  `pos.file` does not end with a given suffix. `prove_lemmas/4` uses this
  to filter out noise from the bundled `TPTP.thy` and other transitively
  imported theories that previously produced phantom lemma rows.
- **Lemma names always come through** on the per-lemma path — the name
  comes from the body (via `lemma_specs/1`), not from parsing `theorem
  name:` out of Isabelle messages. Sledgehammer / Nitpick verdicts no
  longer surface as `name: nil`.

### Added

- **`AtpClient.Isabelle.lemma_specs/1`** — extracts `[%{name, range}]` from
  a theory body. Used internally by `prove_lemmas/4` and `prove_tptp/3`;
  exposed for callers that want to pre-compute specs.

### Migration notes

- If you call `ResultNormalization.per_lemma_results(payload)` or
  `per_lemma_results(payload, line_offset: n)`, switch to
  `per_lemma_results(payload, specs, opts)` where `specs` come from
  `AtpClient.Isabelle.lemma_specs(body)`. The high-level
  `prove_lemmas/4` / `prove_tptp/3` / `query_tptp/2` entry points already
  do this for you.
- Drop reads of the `:line` field on `lemma_result` entries. Attribute by
  `:name` instead — line numbers were a generated-theory leak and have
  been removed from the returned shape.

## [0.3.0] - 2026-06-26

### Added

- **`AtpClient.LocalExec` backend.** Invokes a locally installed,
  TPTP-compliant prover binary (E, Vampire, …) via `System.cmd/3` and
  normalizes its stdout through the existing SZS classifier. Two-layered
  timeout: the prover-side CPU limit (passed via `:args`) lets provers
  emit a clean `SZS status Timeout`, and an independent BEAM-side
  wall-clock timeout (`:wall_timeout_ms`) kills wedged processes. Both
  paths fold into the same `{:ok, :timeout}` result so callers do not have
  to branch on the failure mode. Binary resolution goes through
  `System.find_executable/1`; missing binaries surface
  `{:error, {:prover_not_found, name}}` rather than raising.
- **`AtpClient.Backend` behaviour** so a UI (Smart Cell, Livebook, …) can
  enumerate and drive backends without hard-coding per-backend knowledge.
  Every backend (`SystemOnTptp`, `StarExec`, `Isabelle`, `LocalExec`) now
  implements `config_key/0`, `label/0`, `config_schema/0`, `verify/1`, and
  `query/2`. The new `query/2` is the cross-backend entry point: it takes
  a TPTP-format problem string plus a keyword list and returns a single
  `atp_result()`, hiding session login/logout, prover selection, theory
  bookkeeping, etc. Per-backend low-level entry points
  (`query_system/3`, `prove_theory/4`, `create_job/3`, `query_tptp/2`, …)
  remain available for callers that need sessions, multi-system fan-out,
  or per-lemma detail.
- **`AtpClient.Config.Field` struct** carrying the UI metadata each
  backend exposes via `config_schema/0`: logical `:type` (`:string |
  :integer | :boolean | :string_list`), layout `:group` (`:connection |
  :defaults | :advanced`), `:required?`, `:secret?`, `:default`, `:label`,
  and `:doc`. The library performs no coercion; values flow into
  `Application.put_env/3` and the per-call opts as-is.
- **`AtpClient.backends/0`** returns the list of behaviour-implementing
  backend modules so a UI can discover them without hard-coding.
- **TPTP-shaped entry points on `AtpClient.Isabelle`.**
  `query_tptp/2` and `prove_tptp/3` accept a TPTP/THF problem string,
  route it through `IsabelleClient.TPTP.isabellize_theory/1`, append the
  configured `:proof_method` (default `"by auto"`) to each generated
  `lemma`, and submit the resulting theory to Isabelle with
  `imports "TPTP"` and `unbundle from_TPTP` active. The bundled `TPTP.thy`
  support theory is copied into `:local_dir` on first use so Isabelle's
  loader can resolve the import. Lemma names carry over from TPTP formula
  names. Pass `proof_method: "by metis"`, `"sledgehammer"`, etc. for
  stronger or probing tactics. Smart-cell consumers can now drive Isabelle
  from the same TPTP editor they use for the SystemOnTPTP / StarExec /
  LocalExec backends; theory-text entry points (`prove_theory/4`,
  `prove_lemmas/4`, `query/3`, `query_lemmas/3`) remain unchanged.
- **`AtpClient.Isabelle.SessionOwner`** — a private GenServer that owns
  the link to `IsabelleClient.Shared`. `open_session/1` now returns
  `{:error, reason}` on a failed connection without killing
  non-trapping callers, a later Shared crash surfaces through a monitor
  rather than an `:EXIT`, and dropping the caller monitors the owner to a
  clean shutdown so server-side Isabelle sessions are no longer orphaned.
  The `Session` struct gains an opaque `:owner` field; treat it as
  internal.
- **`scripts/build_eprover.sh`** — builds the E theorem prover from source
  and installs it to `priv/bin/eprover` for use as the `:local_exec`
  backend's binary.
- **`AtpClient.ResultNormalization.failure_t/0` gains
  `{:prover_not_found, String.t()}`** for the `LocalExec` binary-resolution
  failure mode.

### Changed

- **Library defaults are merged per-key inside `AtpClient.Config.get/2`**
  instead of being seeded via `mix.exs`'s `:env` block. The previous
  arrangement let any `config :atp_client, :<backend>, …` in user
  `config.exs` silently replace the whole keyword list (the OTP
  `Application.put_env/3` semantics), dropping defaults the user had not
  explicitly re-set — most visibly causing a fresh install with partial
  SystemOnTPTP config to fail with `{:unrecognized_output, ""}` because
  `:url` had vanished. Defaults now live in `AtpClient.Config`'s
  `@defaults` (exposed via `Config.defaults/0`) and are merged underneath
  Application env on every read, so a partial user config only overrides
  the keys it actually names. No call-site changes required.
- **`AtpClient.SystemOnTptp.list_provers/0` blocks on the first call**
  until the startup refresh completes or `:sotptp, :refresh_timeout_ms`
  (default 15 s) elapses; subsequent calls return the cached list
  immediately. On timeout it returns `[]` rather than raising. Previously
  it could return `[]` immediately after application start, which the
  caller had no way to distinguish from "the SystemOnTPTP deployment is
  empty."
- **`per_lemma_results/2` now applies `check_tool_signals` per source line.**
  Sledgehammer's `"found a proof"` and Nitpick's `"found a counterexample"`/
  `"found a model"` verdicts surface as `{:ok, :thm}` / `{:ok, :csat}` /
  `{:ok, :sat}` on the per-lemma path, matching `interpret_isabelle_result/1`.
  Previously only Isabelle's `"theorem name:"` completion notification was
  recognised, so probe-style proofs (`sledgehammer nitpick oops`) collapsed
  to `:gave_up`. `{:ok, :timeout}` and `{:ok, :out_of_resources}` are now
  surfaced per lemma too. Lemma names attach to results derived from
  `theorem name:` completions; sledgehammer/nitpick verdicts carry
  `name: nil` because the underlying messages do not name the lemma.
- **`:isabelle_elixir` is now pinned to its GitHub `main` branch**
  (previously `~> 0.3` from Hex). The new TPTP entry points rely on the
  `IsabelleClient.TPTP` module, which is not in the 0.3.0 Hex release.
  Re-pin to a Hex constraint once a release containing `IsabelleClient.TPTP`
  ships.

### Breaking

- **`AtpClient.Isabelle.query/3` no longer defaults its `opts`
  argument.** Pass `[]` explicitly if you have no options to set. The
  default was removed so the three-arity `query/3` (theory text + name)
  does not collide with the `Backend` behaviour's `query/2` (TPTP problem
  string + opts), both of which now live on the same module.
- **`AtpClient.Isabelle.Session` gained an enforced `:owner` field.**
  Anyone constructing `%Session{client: …, config: …}` outside the
  library must now also pass `:owner` (the pid of an
  `AtpClient.Isabelle.SessionOwner`). Treat `Session` as opaque and
  construct it only via `open_session/1`.

### Migration notes

- If you call `Isabelle.query(theory, name)`, change it to
  `Isabelle.query(theory, name, [])`.
- If you read the `:env` block of `AtpClient.MixProject` to discover
  library defaults, call `AtpClient.Config.defaults/0` instead. The
  `:env` block has been removed since the defaults are no longer
  load-bearing there.
- If you construct `%AtpClient.Isabelle.Session{}` directly, stop —
  use `open_session/1` so the `:owner` pid is set up correctly.

## [0.2.0]

### Changed

- **Bumped `:isabelle_elixir` to `~> 0.2`.** The upstream library has moved to a
  task-based wire model; `AtpClient.Isabelle` has been rewritten on top of the
  new `IsabelleClient` (the stateful struct client) instead of
  `IsabelleClientMini`'s polling API. End users of the high-level functions
  (`open_session/1`, `prove_theory/4`, `query/3`, `close_session/1`) see the
  same surface — only the internals and a few error / raw-result shapes have
  changed.
- `AtpClient.Isabelle.Session` now wraps an `%IsabelleClient{}` rather than
  carrying `:socket` and `:session_id` directly. Code that pattern-matched
  `%Session{socket: socket, session_id: id}` must instead use
  `%Session{client: %IsabelleClient{socket: socket, session_id: id}}`.
- `:raw` mode of `prove_theory/4` and `query/3` now returns the `use_theories`
  result map directly (with top-level `"ok"`, `"errors"`, `"nodes"` keys)
  instead of the previous keyword list of poll messages.
  `AtpClient.ResultNormalization.extract_isabelle_text/1` already accepted this
  map shape, so call sites that used it continue to work unchanged.
- `AtpClient.ResultNormalization.interpret_isabelle_status/1` has been
  **renamed** to `interpret_isabelle_result/1` and now takes the `use_theories`
  result map directly. The keyword-list input form is no longer supported — the
  previous name and shape were leaks of the old polling abstraction.
- `AtpClient.ResultNormalization.extract_isabelle_text/1` no longer accepts a
  keyword list; only the result map is supported.
- `{:error, {:isabelle_failed, payload, status}}` errors now carry `payload`
  (the FAILED body) and `notes` (intermediate `NOTE` payloads accumulated by
  `IsabelleClient.Task`) instead of the old full poll-message keyword list. The
  "Cannot load theory file" annotation path continues to emit a 3-tuple
  `{:isabelle_failed, payload, [hint:, local_dir:, isabelle_dir:]}` as before.

### Removed

- The `:poll_interval_ms` Isabelle config setting is no longer used — the new
  client uses blocking `await_task` semantics rather than polling. Existing
  `config :atp_client, :isabelle, poll_interval_ms: ...` entries are removed.

### Migration notes

For most callers the upgrade is a no-op beyond updating `mix.exs`. Action is
only needed if you:

- Pattern-match on `AtpClient.Isabelle.Session` fields directly. Use
  `session.client.socket` / `session.client.session_id` instead, or treat
  `Session` as opaque.
- Call `AtpClient.ResultNormalization.interpret_isabelle_status/1`. Rename to
  `interpret_isabelle_result/1` and pass the `use_theories` payload map (or
  `task.result` of a finished `IsabelleClient.Task`) instead of the previous
  poll keyword list.
- Use `prove_theory(..., raw: true)` or `query(..., raw: true)` and consume the
  result with anything other than `extract_isabelle_text/1`. The shape is now a
  plain map; expect `payload["nodes"]`, `payload["ok"]`, etc.
- Match on the third element of `{:error, {:isabelle_failed, _, _}}` errors. It
  is now either a list of NOTE payload maps (plain failures) or a
  `[hint:, local_dir:, isabelle_dir:]` keyword list (annotated "Cannot load
  theory file" failures).

## [0.1.3]

Prior versions are not retroactively documented here. See git history for
changes before 0.2.0.