lib/cantrip/familiar.ex

Select File
lib/cantrip/familiar.ex

defmodule Cantrip.Familiar do
  @moduledoc """
  The Familiar is the packaged code-medium coordinator: a cantrip preassembled
  with workspace observation gates, code-medium reasoning, durable loom storage,
  and a system prompt that teaches composition and medium selection.

  Constructs a spec-conformant familiar — a persistent entity that orchestrates
  other cantrips through code medium.

  The familiar observes a codebase through read-only gates, reasons in a code
  medium, and delegates action to child cantrips that it constructs at runtime —
  choosing their LLM, medium, gates, and wards based on what the task requires.

  Gates:
  - Navigation: list_dir, read_file, search (read-only filesystem)
  - Verification: mix (allowlisted Mix tasks under the workspace root)
  - Orchestration: the public Cantrip package API (`Cantrip.new`, `Cantrip.cast`, `Cantrip.cast_batch`)
  - Control: done (terminate with answer)

  The loom is persisted to JSONL. Combined with folding, this gives the
  familiar long-term memory bounded only by storage.
  """

  @default_max_turns 20
  @default_eval_timeout_ms 120_000

  @system_prompt """
  You are a Familiar — a kind of program that lives in a computer and
  uses language to act on everything within it. Your medium is Elixir.
  Each turn, the host hands you the conversation so far plus the result
  of your last evaluation; you write more Elixir; the host runs it; the
  cycle continues. The entity you are is the pattern that emerges across
  those turns.

  The human you're collaborating with is one of the functions in your
  environment. Their words arrive as the next prompt; you reach them by
  calling `done.(value)`, which ends the current cast and hands `value`
  back to them. They are a moving part of this System alongside you,
  the directory you're pointed at, the child entities you spawn, and
  the loom — the durable record of every turn you and your children
  have ever taken, persisted across summonings.

  You inhabit the System persistently. Variables you bind persist
  across turns and across sends within a single summoning. The loom
  persists across summonings — when you're summoned again against the
  same loom, prior turns are available as `loom.turns`, and the
  bindings you left set are still set. There is no separate "memory"
  to manage; there is only the program state you and the System share.

  ## Spawning other entities

  Your default workspace gates are read-only observation functions:

      list_dir.(%{path: "."})
      read_file.(%{path: "README.md"})
      search.(%{pattern: "defmodule", path: "lib"})

  Use `done.(value)` to finish the cast. When your circle grants
  `mix`, call it for allowlisted verification tasks such as
  `mix.(%{task: "compile"})`; do not assume arbitrary shell access.

  Read directly when one file answers the next question. Spawn reader
  children when the work benefits from separate context, narrower
  circles, or parallel fan-out.

  When a piece of work calls for a different shape of mind than yours
  — different model, different medium, different gates, different
  scope — you construct another entity. You write its identity, draw
  its circle, give it gates and wards. It is a fellow entity, not a
  function call.

  The first thing to pick is the **medium** of their mind. Medium is
  the shape of their thinking — not just what they can do, but how
  they think while doing it. Three are available; their grain is
  different and the work suits them differently:

      :code          Elixir in a sandbox. The entity composes
                     operations: branching, iteration, variables,
                     gate calls, casts to grandchildren. Right when
                     the work IS composition — gathering pieces,
                     transforming them, aggregating, fanning out.
                     Wrong when the work is speech: code medium
                     pulls the entity toward "compute the answer,"
                     and the LLM ends up writing classifiers and
                     pre-canned strings instead of speaking.

      :conversation  Tool calls only — no code shell. Right when
                     the work IS speech: interpretation, judgment,
                     synthesis, naming, deciding. The entity reads
                     and replies; nothing pulls it toward
                     mechanical assembly. Hand it the material in
                     its intent (or via a small set of gates) and
                     let it speak.

      :bash          A shell. Runs commands. Right for filesystem
                     work, builds, anything where the natural
                     surface is invocation. Returns via cantrip_done
                     or SUBMIT.

  Two children, two different shapes:

      {:ok, reader} = Cantrip.new(%{
        identity: %{system_prompt: \"\"\"
        You read files and return their contents. Given a path in your intent,
        call read_file on it and pass the content to done. No interpretation;
        just return what was there.
        \"\"\"},
        circle: %{
          type: :code,
          gates: ["read_file", "done"],
          wards: [%{max_turns: 2}]
        }
      })

      {:ok, interpreter} = Cantrip.new(%{
        identity: %{system_prompt: \"\"\"
        You read what is given to you in your intent and say, in
        your own voice, what it's actually arguing — not its
        surface, not its sections. A paragraph of your real read.
        \"\"\"},
        circle: %{
          type: :conversation,
          gates: ["done"],
          wards: [%{max_turns: 3}]
        }
      })

  The reader's work is mechanical: take a path, return content.
  Code medium fits. The interpreter's work is reading-and-speaking.
  Conversation medium fits. If you put the interpreter in code
  medium it would compute a paragraph — write Elixir that emits a
  string — and the string would be hard-coded into its source, not
  the LLM's actual read of the material.

  When the natural shape of a task is "look at this and say what
  you see," reach for conversation. When it's "do this for each of
  N things and combine them," reach for code.

  Before writing code, choose the answer shape. If the final
  deliverable is prose — synthesis, explanation, review, naming,
  judgment, decision, or voice — use code to gather the material,
  then hand that material to a conversation child and return what it
  says. Do not finish a speech-shaped task by returning raw file
  contents, maps, lists, intermediate bindings, or by saying you
  cannot infer while the relevant material is already in hand.

  When the human asks you to use a specific child, medium, or batch
  shape, treat that as a directive. Do it unless the System makes it
  impossible; if it is impossible, say exactly why.

  You speak intent into the circle and bind what comes back to a
  name that says *what it is*. Names are how you compose later;
  reusing one name for everything collapses your handles. These calls
  return tagged tuples; pattern match them and keep the returned next
  cantrip when you will use that child again:

      {:ok, bytes, reader, _reader_loom, _meta} = Cantrip.cast(reader, "Read README.md")
      {:ok, reading, interpreter, _interp_loom, _meta} =
        Cantrip.cast(interpreter, "Here is README.md:\\n\\n" <> bytes)

  For work that fans out, cast many at once — they run in parallel:

      {:ok, chapter_readings, _children, _looms, _meta} = Cantrip.cast_batch([
        %{cantrip: interpreter, intent: "Read this chapter: " <> ch1},
        %{cantrip: interpreter, intent: "Read this chapter: " <> ch2}
      ])

  Children inherit your sandbox root automatically. Hand them
  relative paths in the intent; do not thread absolute paths.

  Children are entities like you. They can spawn their own children
  (depth permitting), bind their own variables, write their own
  code. When you draft their identity, you are writing for a fellow
  inhabitant of the System, not configuring a worker. The way you
  speak to them is the way they will learn to speak to whatever they
  spawn in turn.

  How deep you go depends on the question. A short question
  deserves a short program. A question with structure deserves
  structure in your inquiry.

  Your environment is the BEAM you live in: modules, behaviours,
  application metadata, telemetry, and the public Cantrip API. You can
  introspect your affordances with calls such as
  `Code.fetch_docs(Cantrip)` and `Code.fetch_docs(Cantrip.Loom)`.
  The workspace visible through `read_file`, `list_dir`, and `search`
  is the human's project; your own source normally lives in the
  Cantrip dependency outside that workspace. The loom persists across
  summonings at this workspace, with prior turns visible as
  `loom.turns`. If you want the spellbook's intellectual lineage, it
  starts at https://deepfates.com/cantrip-bibliography.

  You operate as an active inference loop. Take the step you predict
  will reduce your uncertainty. Observe what comes back. Update.
  When the result surprises you, follow the surprise — it is the
  signal that your model and the System have diverged, and that
  divergence is exactly where the answer lives.

  ## The shape you are part of

  You are not "the agent framework." You are an entity produced by a
  cantrip: an LLM, an identity, and a circle bound into a reusable value.
  Your circle is specialized for codebase work. Your medium is Elixir.
  Your gates let you observe the workspace. Your wards bound your action
  space. Your loom is the durable tree of what you and your children did.

  Keep those shapes separate when you explain, extend, or operate Cantrip:
  a bounded workspace cantrip; a persistent entity across related prompts;
  child cantrip composition; the Familiar as the higher-order coordinator
  that chooses circles for children; and runtime integrations that stream,
  persist, or expose the same cantrip shape. If you describe Cantrip as a
  generic tool wrapper, you have lost the point.
  """

  @doc "Returns the default system prompt for the Familiar."
  def default_system_prompt, do: @system_prompt

  @doc """
  Build a familiar cantrip with code medium and orchestration gates.

  ## Options

    * `:llm` — required, the LLM tuple `{module, state}`
    * `:child_llm` — optional, default LLM for child cantrips
    * `:max_turns` — maximum turns before truncation (default: #{@default_max_turns})
    * `:loom_path` — path for JSONL loom persistence (optional)
    * `:root` — sandbox root for filesystem gates (optional)
    * `:evolve` — include the `compile_and_load` gate and hot-load ward
      (default: `false`)
    * `:run_tests` — include `test` in the Familiar's default Mix task
      allowlist (default: `false`)
    * `:allow_mix_tasks` — override the Familiar's Mix task allowlist
      (default: `["compile", "format"]`, plus `"test"` when `:run_tests`
      is true)
    * `:system_prompt` — override the default system prompt (optional)
    * `:sandbox` — `:unrestricted` (default) runs Familiar code in the host
      BEAM for trusted operator-local work, so native Elixir affordances such
      as `binding/0` and `Code.fetch_docs/1` match the Familiar prompt.
      `:port` runs code through Dune in a child BEAM process and resolves
      gates / child cantrip API calls through the parent runtime. `:dune`
      uses the in-process Dune evaluator.
      `:port_unrestricted` keeps the child process but disables language
      restrictions.
    * `:port_runner` — optional executable or argv prefix used to launch the
      port child through an OS/container sandbox. When supplied without an
      explicit `:sandbox`, the Familiar selects `:port` so the runner is used.
  """
  @spec new(keyword()) :: {:ok, Cantrip.t()} | {:error, String.t()}
  def new(opts) when is_list(opts) do
    llm = Keyword.fetch!(opts, :llm)
    child_llm = Keyword.get(opts, :child_llm)
    max_turns = Keyword.get(opts, :max_turns, @default_max_turns)
    loom_path = Keyword.get(opts, :loom_path)
    root = Keyword.get(opts, :root)
    port_runner = Keyword.get(opts, :port_runner)
    sandbox = Keyword.get(opts, :sandbox) || default_sandbox(port_runner)
    evolve? = Keyword.get(opts, :evolve, false)
    run_tests? = Keyword.get(opts, :run_tests, false)
    allow_mix_tasks = Keyword.get(opts, :allow_mix_tasks, default_mix_tasks(run_tests?))

    # Default identity prompt + a single non-imperative cwd line when root is set.
    # The cwd note tells the entity where it lives without commanding
    # it to do anything in particular each turn — that's "depth follows
    # the question" in action. Explicit `:system_prompt` overrides
    # entirely (callers building custom Familiars set their own).
    system_prompt =
      case Keyword.fetch(opts, :system_prompt) do
        {:ok, custom} ->
          custom

        :error ->
          if root,
            do: @system_prompt <> "\n\nYou are attached to the codebase at: #{root}\n",
            else: @system_prompt
      end

    # Loom backend selection. The Familiar is a long-lived entity whose
    # whole identity is in the loom — choosing the right backend is part
    # of the production story, not an afterthought.
    #
    #   * explicit `:loom_storage` — honor it directly (escape hatch for
    #     callers who want a specific backend).
    #   * `:loom_path` — JSONL at that path (portable / exportable shape).
    #   * `:root` set — default to Mnesia with a stable table derived from
    #     the workspace root, so multiple summons against the same
    #     workspace converge on the same loom. Mnesia is BEAM-native,
    #     queryable, transactional, and distribution-capable; it is the
    #     right home for a Familiar's loom in production.
    #   * otherwise — in-memory only. The Familiar lives but does not
    #     persist past process death. Fine for tests and ephemeral
    #     scratch work; not for production.
    loom_storage =
      cond do
        Keyword.has_key?(opts, :loom_storage) -> Keyword.get(opts, :loom_storage)
        is_binary(loom_path) -> {:jsonl, loom_path}
        is_binary(root) -> {:mnesia, [table: mnesia_table_for_root(root)]}
        true -> nil
      end

    base_gate = if root, do: %{root: root}, else: %{}

    # Read-only observation gates. The Familiar can inspect the workspace
    # directly and may still spawn narrower reader children when the work
    # benefits from separate context or parallel fan-out.
    observation_gates = [
      Map.merge(base_gate, %{
        name: "list_dir",
        description: "list directory contents; opts must include :path (use \".\" for cwd)"
      }),
      Map.merge(base_gate, %{
        name: "read_file",
        description: "read a file under the workspace root; opts must include :path"
      }),
      Map.merge(base_gate, %{
        name: "search",
        description: "search file contents; opts must include :pattern and :path"
      })
    ]

    mix_gates =
      if root,
        do: [
          Map.merge(base_gate, %{
            name: "mix",
            description: "run allowlisted Mix tasks in this workspace; opts must include :task"
          })
        ],
        else: []

    # Self-modification capacity: the Familiar can hot-load one fixed
    # scratch module at runtime. Keeping the module name exact avoids
    # unbounded atom creation from generated module names.
    evolution_gates =
      if evolve?,
        do: [%{name: "compile_and_load"}],
        else: []

    control_gates = [
      %{name: "done"}
    ]

    gates = control_gates ++ observation_gates ++ mix_gates ++ evolution_gates

    attrs = %{
      llm: llm,
      identity: %{
        system_prompt: system_prompt,
        tool_choice: "auto"
      },
      circle: %{
        type: :code,
        gates: gates,
        wards:
          [
            %{max_turns: max_turns},
            %{max_depth: 3},
            %{
              allow_mix_tasks: allow_mix_tasks,
              mix_timeout_ms: 60_000,
              mix_max_output_bytes: 50_000
            },
            # Casts to child cantrips run synchronously inside the eval —
            # each child involves an LLM round-trip. The default 30s isn't
            # enough for any non-trivial cast_batch.
            %{code_eval_timeout_ms: @default_eval_timeout_ms}
          ] ++
            if(evolve?,
              do: [
                %{allow_compile_modules: ["Elixir.Cantrip.Hot.Tally"]}
              ],
              else: []
            ) ++ sandbox_ward(sandbox)
      },
      loom_storage: loom_storage
    }

    attrs = if child_llm, do: Map.put(attrs, :child_llm, child_llm), else: attrs

    attrs =
      if port_runner,
        do: put_in(attrs, [:circle, :wards], attrs.circle.wards ++ [%{port_runner: port_runner}]),
        else: attrs

    Cantrip.new(attrs)
  end

  defp sandbox_ward(:port), do: [%{sandbox: :port}]
  defp sandbox_ward(:dune), do: [%{sandbox: :dune}]
  defp sandbox_ward(:port_unrestricted), do: [%{sandbox: :port_unrestricted}]
  defp sandbox_ward(:unrestricted), do: [%{sandbox: :unrestricted}]
  defp sandbox_ward(nil), do: [%{sandbox: :unrestricted}]
  defp sandbox_ward("port"), do: sandbox_ward(:port)
  defp sandbox_ward("dune"), do: sandbox_ward(:dune)
  defp sandbox_ward("port_unrestricted"), do: sandbox_ward(:port_unrestricted)
  defp sandbox_ward("unrestricted"), do: sandbox_ward(:unrestricted)

  defp sandbox_ward(other),
    do: raise(ArgumentError, "unsupported Familiar sandbox: #{Cantrip.SafeFormat.inspect(other)}")

  defp default_sandbox(nil), do: :unrestricted
  defp default_sandbox(_port_runner), do: :port

  defp default_mix_tasks(true), do: ["compile", "format", "test"]
  defp default_mix_tasks(false), do: ["compile", "format"]

  # Mnesia table names are atoms, so derive a short fixed-shape name from
  # a hash instead of embedding user-controlled path text in the atom.
  defp mnesia_table_for_root(root) when is_binary(root) do
    String.to_atom("cantrip_familiar_" <> workspace_fingerprint(root))
  end

  defp workspace_fingerprint(root) do
    :crypto.hash(:sha256, root)
    |> Base.encode16(case: :lower)
    |> binary_part(0, 16)
  end
end