Skip to main content

guides/tools.md

# Tools

## One tool, two orthogonal axes

A tool is declared once. What varies is captured on two independent axes, not one,
because "needs human input" and "server handles it" are two different questions
tangled together.

1. **Executor** — who produces the result.
2. **Approval gate** — whether the call is gated by human confirmation before it runs.

These are independent. A server tool can be gated (`send_email` runs server-side but
needs a confirm click first) or ungated. Confirmation is not a kind of tool; it is a
gate in front of an otherwise-server tool. "Answering," by contrast, is a tool whose
*result is* the human's input — a different control flow entirely.

## Executor axis

- `:server` — your code runs it and returns a result. The default.
- `:human` (elicitation) — the human *is* the executor; their answer is the result.
  The agent suspends. One-phase: nothing runs after the human responds.
- `:client` — runs in the browser/LiveView: geolocation, client/device state, a file
  picker, a client-side computation. Not a human decision and not the server. The
  execution boundary is the socket; only works while one is live.
- `:provider` — the *provider* executes it (hosted web search, code interpreter) and
  the result returns in the stream. Pass it through, **do not** dispatch locally.
  Already in the stack via ReqLLM's provider-hosted tools.

`:sub_agent` is deferred — it is `:server` with a spawn inside; modelling it as a
distinct executor adds complication for no gain.

## Approval gate (orthogonal modifier)

A policy on side-effecting tools: `:auto` or `:requires_approval`. Two-phase:
suspend → human approves → the tool then executes → result. Distinct from `:human`
elicitation, which is one-phase.

**Legal matrix:** the gate applies to **`:server` and `:client` only**.
`:provider` cannot be gated — the provider executes mid-stream; there is no
pre-execution suspend point. Gating `:human` is circular (approve asking the
human?) — reject it at definition time. A gated `:client` call suspends **twice**:
once for approval, once for client execution — two pending entries, two
resolutions.

## Progressive tools (a property, not a type)

A tool emitting intermediate progress before a final result (a long server job
streaming logs) is marked `:streaming?`. Orthogonal to the executor — it affects the
renderer (show progress) and whether the call resolves as one event or many.

## Tool definition (sketch)

```
name:             string
description:      string
parameter_schema: NimbleOptions keyword list, passed through to ReqLLM
executor:         :server | :human | :client | :provider
approval:         :auto | :requires_approval
streaming?:       boolean
retention:        see 05-compaction (per-tool age/count + never_evict)
callback:         required for :server (and the post-approval phase of gated tools)
```

`:server` callbacks receive parsed args plus the `%Agentix.Turn{}` scope (see `02`)
for ambient context (current user, db handle).

**Schema pass-through, not a new layer.** ReqLLM's `Tool.new/1` already natively
accepts NimbleOptions keyword lists (and raw JSON Schema) and compiles them to JSON
Schema itself (verified v1.16.0). Agentix hands the schema through verbatim — it
must not re-compile or re-validate, or schemas get double-compiled and the two
interpretations drift.

## The control-flow collapse

From the state machine's point of view every executor reduces to one of two shapes:

- **Resolve-in-process**: `:server`, and `:provider` (resolves in-stream).
- **Suspend-and-await-external-resolution**: `:human`, `:client`, anything gated,
  and (later) `:sub_agent`.

So you build *one* suspension primitive; the executor only parameterizes who may
resolve and how the UI prompts. The distinction stays explicit where it matters (tool
declaration, loop dispatch, renderer) without multiplying machinery.

## The suspension/resolution primitive

A suspended call is a pending tool call with a correlation id (`tool_call_id`) and a
resolver. A single turn can carry a **mix** — three tool calls, two `:server`
(self-resolve in ms) and one `:human` (resolves whenever). `awaiting_input` means
"awaiting resolution of N pending calls, some already done." The agent holds:

```
pending: %{tool_call_id => status}   # :running | :awaiting | :resolved | :errored
```

The resume to the next LLM turn fires only when the pending set empties.

Two related-but-distinct shapes share the name "pending" — keep them straight:

- This **in-memory tracking map** holds *every* call in the turn with its status,
  including `:running` `:server` calls. It is the agent's working state for deciding
  when to resume; it is never persisted as-is.
- The **persisted/rendered `pending`** (`fsm_state.pending` in `01`/`04`, the
  renderer assign in `06`) is only the *awaiting-external* subset, shaped
  `%{tool_call_id => %{executor, kind, prompt}}` where `kind` is
  `:approval | :elicitation | :client_exec` — the field the renderer actually
  switches on. Running `:server` calls are not in it —
  if the agent is killed mid-turn, those are recovered by re-running from the log
  (the `tool_call` with no `tool_result`), not from the snapshot. The renderer shows
  running calls via `in_flight_tools`, awaiting ones via `pending`.

### Resolver interface

```
:gen_statem.call(via(conversation_id), {:resolve, tool_call_id, result})
```

`call`, not `cast`, for the synchronous ack — a confirm click that silently vanishes
is a terrible HITL failure mode. The agent:

1. Validates `tool_call_id` against the pending set; if stale, unknown, or already
   resolved, replies `{:error, :stale}` (covers double-clicks, resubmits, expiries).
2. Records the result, **replies `:ok` immediately**.
3. *Only then*, via an internal `:next_event`, decides whether to start the next turn.

Replying before resuming is essential: resume-first blocks the caller for the whole
next turn and trips the 5s call timeout.

### Addressing and revival

The caller never holds the agent pid. It resolves through
`ensure_started(conversation_id)`, which returns the live process or starts/rehydrates
one, then calls. This is what lets a suspended-on-human conversation survive the agent
being killed: the answer arriving revives it.

Resolution is a **public API**, not a socket affordance: anything holding a
`conversation_id` and a `tool_call_id` — a LiveView, a webhook controller, a job, an
external system — calls the same `resolve`. This is what generalizes the suspension
primitive from HITL chat into durable workflows (see `08`).

### Timeout and idempotency

Every suspending call needs a timeout. Default: resolve an unanswered call to a
tool-error result the model can recover from ("user did not respond"). Idempotency
keyed on `tool_call_id` covers the kill → revive → late-answer race — and the same
id is the idempotency key side-effecting tools should honor, because a kill
mid-`:server`-tool is recovered by re-dispatching that exact call (see `01`).
Timeout machinery is owned by the persistence adapter (`schedule_expiry` /
`cancel_expiry` — see `04`) rather than a per-agent timer, since a per-agent timer
dies with a killed agent.

## Resolved: `:client` is `:human` with JS as the "user"

The agent emits `{:suspended, id, :client, args}`; a registered JS hook maps tool name
→ client function, executes, and `pushEvent`s back to the LiveView, which calls the
same `resolve`. Mechanically symmetric to elicitation, just no visible prompt.

**Security rule (write it down):** client results are user-controllable. The server
validates them and **never trusts a `:client` result for a privileged decision.**

Two edge rules: with **no live socket** (headless/API callers) a `:client` call
fails fast to a tool-error after a short grace period — it must not park the
conversation in `awaiting_input` forever. With **multiple sockets** (two tabs)
both execute the JS; the second `resolve` gets `{:error, :stale}` server-side, but
client-side double side effects (two file pickers) are the app's to guard.

## Resolved: approval vs elicitation — one mechanism, two components

The resolution path is identical (`{:resolve, tool_call_id, result}`), so the headless
layer has one `pending` concept and one resolver. But ship **two** default components
(see `06`): `<.approval>` (a boolean gate) and `<.elicitation>` (an arbitrary form).
Don't force a form abstraction over a yes/no.

## Who consumes the executor field

- **The loop** — dispatch: run, suspend, or pass through.
- **The renderer** — `:human` → elicitation form; gated → confirm card; `:provider`
  → "searched the web" affordance; `:client` → execute (often invisible); `:server`
  → tool-call card with result/progress.
- **Persistence / replay** — the `tool_calls` table tracks executor and status so
  suspensions survive a kill.

## Result convention

Structured results carry model-visible semantics in the content body as JSON —
`%{ok: true, result: ...}` / `%{ok: false, error: ...}` — following ReqLLM, so
follow-up turns don't depend on adapter-only metadata.