<!-- livebook:{"persist_outputs":true} -->
# Isabelle TPTP smoke test
## Section
A by-hand check of `AtpClient.Isabelle.query_tptp/2` and `prove_tptp/3` against
a real Isabelle server. Offline tests pin the dep contract and the wiring; this
notebook exercises the end-to-end path that CI cannot.
The server is spun up locally on demand — no external setup required beyond
having `isabelle` on `PATH` (or `ISABELLE_TOOL` pointing at the executable).
```elixir
Mix.install([
{:atp_client, "~> 0.5"}
])
```
<!-- livebook:{"output":true} -->
```
:ok
```
## Spin up an Isabelle server
```elixir
# Optional: System.put_env("ISABELLE_TOOL", "/path/to/isabelle")
IsabelleClient.Server.executable()
```
<!-- livebook:{"output":true} -->
```
{:ok, "/home/johannes/Isabelle2025/bin/isabelle"}
```
```elixir
{:ok, server} =
IsabelleClient.start_server(
server_name: "atp_client_tptp_#{System.unique_integer([:positive])}"
)
server_opts = [
host: server.host,
port: server.port,
password: server.password,
local_dir: Path.join(System.tmp_dir!(), "atp_client_tptp_#{System.unique_integer([:positive])}")
]
```
<!-- livebook:{"output":true} -->
```
[
host: "127.0.0.1",
port: 33215,
password: "1c031eff-f677-4c70-b820-3b79d39ed7fd",
local_dir: "/tmp/atp_client_tptp_3078"
]
```
## 1. Single-conjecture problem
`query_tptp/2` opens a session, copies the bundled `TPTP.thy` into
`:local_dir`, isabellizes the problem, appends the configured
`:proof_method` (default `"by auto"`) to each generated lemma, and routes
through `prove_lemmas/4` with `imports "TPTP"` and `unbundle from_TPTP`
active.
The isabellizer emits `lemma name: "…"` without a proof, which Isabelle
treats as an open goal — without a tactic each lemma comes back as
`{:ok, :gave_up}` with `name: nil`. `:proof_method` is what closes the
goal. Pass `proof_method: "by metis"`, `"sledgehammer"`, `"try0 oops"`,
etc. for stronger or probing behaviour.
```elixir
problem = ~S"""
thf(p_type, type, p: $i > $o).
thf(g, conjecture, ! [X: $i]: (p @ X | ~ (p @ X))).
"""
{:ok, results} = AtpClient.Isabelle.query_tptp(problem, server_opts)
results
```
<!-- livebook:{"output":true} -->
```
[%{name: "g", result: {:ok, :theorem}}]
```
Expected: a single-element list with `name: "g"` and `result: {:ok, :theorem}`.
Line numbers are not exposed — entries are attributed by `:name`, carried
over from the TPTP formula name.
## 2. Stable theory name
Pass `:theory_name` to keep filenames stable across re-runs (useful when
inspecting the on-disk `.thy` file after the fact).
```elixir
{:ok, _} =
AtpClient.Isabelle.query_tptp(
problem,
server_opts ++ [theory_name: "ExcludedMiddle"]
)
server_opts[:local_dir] |> File.ls!() |> Enum.sort()
```
<!-- livebook:{"output":true} -->
```
["AtpClient_bf18a2b59d89.thy", "ExcludedMiddle.thy", "TPTP.thy"]
```
Expected to include `"ExcludedMiddle.thy"` and `"TPTP.thy"`.
## 3. Multi-lemma problem with a persistent session
Open one session, run several problems, close it once. This is the shape
`KinoAtpClient` will use — start-up cost (which is dominated by
`session_start`) amortises across UI events.
```elixir
{:ok, session} = AtpClient.Isabelle.open_session(server_opts)
try do
multi = ~S"""
thf(p_type, type, p: $i > $o).
thf(q_type, type, q: $i > $o).
thf(g1, conjecture, ! [X: $i]: (p @ X | ~ (p @ X))).
thf(g2, conjecture, ! [X: $i]: ((p @ X & q @ X) => p @ X)).
"""
AtpClient.Isabelle.prove_tptp(session, multi)
after
AtpClient.Isabelle.close_session(session)
end
```
<!-- livebook:{"output":true} -->
```
{:ok, [%{name: "g1", result: {:ok, :theorem}}, %{name: "g2", result: {:ok, :theorem}}]}
```
Expected: two entries, names `"g1"` and `"g2"`, both `{:ok, :theorem}`.
## 4. Sledgehammer / Nitpick as proof method
`per_lemma_results/3` applies the same tool-signal classifier as the
single-result mode, so `proof_method: "sledgehammer nitpick oops"` produces
`{:ok, :theorem}` when sledgehammer finds a proof and
`{:ok, :counter_satisfiable}` when nitpick finds a counterexample — even
though `oops` withdraws the goal and
no `theorem name:` completion message is ever emitted.
```elixir
probe = ~S"""
thf(g_taut, conjecture, ! [X: $o]: (X | ~ X)).
thf(g_false, conjecture, ! [X: $o]: (X & ~ X)).
"""
AtpClient.Isabelle.query_tptp(
probe,
server_opts ++ [proof_method: "sledgehammer nitpick oops", use_theories_timeout_ms: 600_000]
)
```
<!-- livebook:{"output":true} -->
```
{:ok, [%{name: "g_taut", result: {:ok, :theorem}}, %{name: "g_false", result: {:ok, :counter_satisfiable}}]}
```
Expected: `g_taut` → `{:ok, :theorem}`, `g_false` → `{:ok, :counter_satisfiable}`. Lemma names
come through regardless of proof method — they are read from the body, not
parsed out of Isabelle's per-message text (which omits the name for
`oops`-based diagnostic methods).
## 5. Shut the server down
```elixir
IsabelleClient.Server.kill(server.name)
```
<!-- livebook:{"output":true} -->
```
{"", 0}
```
## 6. What to look for if something fails
- `{:error, :isabelle_not_found}` from `IsabelleClient.Server.executable/0` —
`isabelle` is not on `PATH`. Set `ISABELLE_TOOL` to the executable path.
- `{:error, {:tptp_thy_copy_failed, _}}` — `:local_dir` is not writable, or the
dep is not loaded correctly. Check that `IsabelleClient.TPTP.source_path()`
in the `isabelle_elixir` dep resolves to a real file.
- `{:error, {:isabelle_failed, %{"message" => "Cannot load theory file"}, [hint: _, …]}}` —
Isabelle could not read `TPTP.thy` or the generated theory. Inspect the
hint's `:local_dir`/`:isabelle_dir` to check the cross-mount.
- `{:error, {:isabelle_failed, %{"message" => msg}, _}}` where `msg` mentions
`unbundle from_TPTP` or unknown syntax — the bundled `TPTP.thy` was found
but `unbundle from_TPTP` did not activate; check the dep version.
- Every lemma comes back `{:ok, :gave_up}` — the `:proof_method` did not
close any goal. Either the tactic is wrong for the problem class (try
`"by metis"` for first-order problems) or the default `"by auto"` is
too weak. The same shape with the proof method omitted entirely is a
regression of the proof-injection step.