guides/reference/full-reference.md

Select File
guides/reference/full-reference.md

# HostKit

Elixir-native host infrastructure declarations, planning, and runtime control.

HostKit is intended to be used from a normal Mix project with `.exs` infrastructure files. The DSL compiles to plain inspectable structs; Mix tasks are wrappers around the runtime API.

## Design

- Core owns systemd/systemdkit persistent units.
- Core owns unitctl transient runtime primitives.
- Integrations such as Caddy, Forgejo, object storage, and monitoring are providers.
- DSL evaluation never applies changes to a host.
- Planning and rendering are available as runtime APIs.

## Example

```elixir
use HostKit.DSL

project :toys do
  roots source: "/opt/toys/src",
        data: "/srv/toys",
        state: "/var/lib/toys",
        config: "/etc/toys"

  prefixes user: "toys-", unit: "toys-"

  host :elixir_toys do
    hostname "elixir.toys"
    user "dannote"
    sudo true
  end

  service :exograph do
    account "toys-exograph", system: true, home: "/var/lib/toys/exograph/home"
    directory "/srv/toys/exograph", owner: "toys-exograph", group: "toys-exograph", mode: 0o755

    daemon "toys-exograph.service" do
      description "Exograph search"
      after_target :network_online
      wants :network_online
      service_user "toys-exograph"
      working_directory "/opt/toys/src/exograph"
      exec_start ["/usr/local/bin/mix", "exograph.index.hex", "--web", "--port", "4200"]
      restart :on_failure
      restart_sec 10
      hardening :web_service
      read_write_paths ["/srv/toys/exograph", "/var/lib/toys/exograph"]
      wanted_by :multi_user
    end
  end
end
```

## Plans and down plans

Rollback is represented as another HostKit plan. A plan change already carries `before` and `after` state, so HostKit can derive a down plan from the exact plan that was applied:

```elixir
{:ok, plan} = HostKit.plan(project, target: prod)
{:ok, down_plan} = HostKit.down(plan)

HostKit.format_plan(down_plan)
HostKit.apply(down_plan, confirm: true)
```

Partial rollback uses the same plan model:

```elixir
{:ok, down_plan} =
  HostKit.down(plan, only: [{:file, "/etc/gatehouse/config.exs"}])
```

Command-like operations need semantic down steps because HostKit cannot infer the opposite of an arbitrary command:

```elixir
command :migrate,
  exec: {"bin/app", ["eval", "App.Release.migrate()"]},
  phase: :before_start,
  down: {"bin/app", ["eval", "App.Release.rollback()"]}

command :warm_cache,
  exec: {"bin/app", ["eval", "App.Cache.warm()"]},
  down: :noop
```

The down command is emitted as an ordinary command change in the down plan. `down: :irreversible` records an explicit warning and omits the command from the down plan.

Created resources use conservative rollback policies. File-like resources can be deleted by a down plan, but directories are kept unless explicitly opted in:

```elixir
directory "/tmp/demo", rollback: :delete_if_created
directory "/srv/app", rollback: :keep
account :app, system: true, rollback: :keep
package :caddy, rollback: :keep
```

CLI usage mirrors this:

```sh
mix host_kit.plan infra/config.exs --host prod --out up.plan.json
mix host_kit.down up.plan.json --out down.plan.json
mix host_kit.apply --plan down.plan.json --confirm
```

## Run tracking

Tracked applies write minimal run records under the project-configured HostKit runs root:

```sh
mix host_kit.apply --track --plan up.plan.json --confirm
mix host_kit.runs --host prod infra/config.exs
mix host_kit.runs --host prod --verbose infra/config.exs
mix host_kit.runs --host prod --latest --verbose infra/config.exs
mix host_kit.down --host prod --run 20260614-101148-demo-up --out down.plan.json infra/config.exs
```

Run records are intentionally compact: they identify the run, project, direction, timestamp, and applied change statuses. They do not replace plan artifacts; use plan artifacts for inspectable up/down plan contents. When a tracked apply is started from `--plan`, HostKit copies that up-plan artifact under the runs root and records the copied path so `mix host_kit.down --last` can work from the tracked run.

Tracked applies also write backup payloads for previous file-like state when that state was captured in the plan. Backup payloads live under `hostkit_backups/<run-id>/` or the `--backups-root` override. `mix host_kit.down --last` and `mix host_kit.down --run RUN_ID` rewrite supported previous file-like state to `%HostKit.BackupRef{}` entries so generated down plans restore from backup payloads instead of embedding prior content. Backup-backed restore currently covers ordinary files plus rendered file resources such as env files, Caddy sites, proxy config, firewall/egress files, and systemd unit files when their previous rendered content was captured. Use `mix host_kit.runs --verbose`, `--latest`, or `--id RUN_ID` to inspect copied plan artifacts and backup payload paths.

Source updates are intentionally not inferred as reversible by default: a previous Git remote/ref may no longer be reachable. Treat source rollback as an explicit lifecycle operation or pair it with a backup/source-bundle strategy.

Run retention is explicit. Use `mix host_kit.runs --prune --keep N` to remove older run records plus their copied plan artifact and backup payload directories.

## Elixir app lifecycle helpers

The Elixir app recipe can emit lifecycle commands for common BEAM deployment operations. Ecto migrations are represented as normal commands with explicit down commands:

```elixir
elixir_app :shop do
  source github: "acme/shop", path: ".", ref: "main"
  phoenix host: "shop.example.com", secret_key_base: secret_env("SECRET_KEY_BASE")

  ecto release: "Shop.Release"
end
```

This emits a `:before_start` migration command that runs through the built release and a matching down command that calls `Shop.Release.rollback()`.

For multiple repos, HostKit emits one ordered command per repo. Down plans reverse that order:

```elixir
elixir_app :shop do
  source github: "acme/shop", path: ".", ref: "main"
  phoenix host: "shop.example.com", secret_key_base: secret_env("SECRET_KEY_BASE")

  ecto release: "Shop.Release" do
    repo "Shop.Repo"
    repo "Shop.AnalyticsRepo"
  end
end
```

The default expressions are:

```elixir
Shop.Release.migrate(Shop.Repo)
Shop.Release.rollback(Shop.Repo)
```

Use `:migrate` and `:rollback` for custom release functions when the defaults do not fit.

## Providers

Providers can contribute DSL modules, resource types, renderers, validators, and read/plan/apply lifecycle operations. Systemd and Unitctl are core primitives, not providers; integrations such as Caddy should be providers.

```elixir
use HostKit.DSL, providers: [HostKit.Providers.Caddy]

project :demo do
  provider :caddy, HostKit.Providers.Caddy do
    set :sites_dir, "/etc/caddy/sites"
  end

  service :web do
    caddy_site :web, "example.com", path: "web.caddy" do
      encode [:zstd, :gzip]
      reverse_proxy "127.0.0.1:4000"
    end
  end
end
```

## Host bootstrap packages and mise-managed runtimes

HostKit can install OS packages through the target package manager. The DSL is distribution-neutral by default and can be pinned to a manager when needed.

```elixir
service :bootstrap do
  package :ca_certificates
  package :build_essential, as: "build-essential", update: true
end
```

HostKit can also bootstrap `mise` and install system-wide tool versions. This is intended for host bootstrap and workspace agents; application services should still prefer packaged release artifacts where possible.

```elixir
service :bootstrap do
  mise path: "/usr/local/bin/mise", system_data_dir: "/usr/local/share/mise" do
    tool :erlang, "29.0.2"
    tool :elixir, "1.20.1"
  end
end
```

This applies through the `mise` CLI contract: it installs the binary with `mise.run` when missing, then runs `mise install --system` with `MISE_SYSTEM_DATA_DIR` set.

Package planning resolves semantic package names through Repology and caches responses in `.host_kit/cache/repology` for 24 hours by default. Use locks for deterministic apply:

```sh
mix host_kit.plan --write-package-lock host_kit.package.lock infra/config.exs
mix host_kit.apply --package-lock host_kit.package.lock --confirm infra/config.exs
```

Plan/apply artifacts make remote changes inspectable before apply. Prefer declaring the remote host in normal `.exs` HostKit config and selecting it with `--host`:

```elixir
use HostKit.DSL

project :infra do
  host :prod do
    hostname "host.example"
    user "root"
    sudo true

    ssh identity_file: Path.expand("~/.ssh/id_ed25519"),
        password: secret_env("HOSTKIT_SSH_PASSWORD"),
        silently_accept_hosts: true
  end
end
```

```sh
mix host_kit.plan --host prod \
  --package-lock host_kit.package.lock \
  --out host_kit.plan.json infra/config.exs

mix host_kit.apply --host prod \
  --plan host_kit.plan.json --confirm infra/config.exs
```

Plan artifacts are JSON and intended to be inspectable. Secret references are stored as references, not values, for example:

```json
{
  "$type": "struct",
  "module": "Elixir.HostKit.Secret",
  "fields": {
    "source": {
      "$type": "tuple",
      "items": [
        {"$type": "atom", "value": "env"},
        "HOSTKIT_SSH_PASSWORD"
      ]
    }
  }
}
```

`secret_env/1` records an environment-backed secret reference and resolves it only at the control-plane boundary that needs the value. Use it for HostKit's own credentials, such as SSH passwords or future provider API tokens. Target application environment files use the env-file DSL, which is backed by the same secret reference type:

```elixir
env_file "/etc/app/app.env" do
  set :mix_env, :prod
  secret :database_url, env: "DATABASE_URL"
end
```

Raw SSH flags remain available as an escape hatch: `--remote`, `--user`, `--port`, `--identity-file`, `--password`, and `--password-env`.

For Linux integration testing, use Incus as the lightweight native container/VM backend:

```sh
HOSTKIT_INCUS_SUDO=true HOSTKIT_SSH_PUBLIC_KEY=$HOME/.ssh/id_ed25519.pub \
  scripts/incus_integration_vm.sh ensure
HOSTKIT_INCUS_SUDO=true scripts/incus_integration_vm.sh ip
```

Set `HOSTKIT_INCUS_TYPE=vm` to launch an Incus VM instead of the default container, and `HOSTKIT_INCUS_INSTANCE=name` to change the instance name. Run the remote CLI integration against Incus with `HOSTKIT_INTEGRATION_TOOL=incus`, or against a pre-existing host declared in `.exs` config with `HOSTKIT_INTEGRATION_TOOL=remote HOSTKIT_INTEGRATION_CONFIG=examples/integration_hosts.example.exs`.

A real remote validation can use the same host config and a shell-provided secret:

```sh
HOSTKIT_SSH_PASSWORD='...' \
HOSTKIT_INTEGRATION_TOOL=remote \
HOSTKIT_INTEGRATION_CONFIG=examples/integration_hosts.example.exs \
mix test test/integration/cli_remote_test.exs --include integration
```

## Project-local DSLs

Use `HostKit.ProjectDSL` in consuming projects to build local conventions without baking them into HostKit.
Load project-local DSL files explicitly through the runtime API or Mix task `--require` option:

```elixir
# infra/toys_infra.exs
defmodule ToysInfra do
  use HostKit.ProjectDSL

  root :source, "/opt/toys/src"
  root :data, "/srv/toys"
  root :state, "/var/lib/toys"
  root :config, "/etc/toys"

  prefix :user, "toys-"
  prefix :unit, "toys-"

  defservice :toy_service do
    let :service_user, do: prefixed(:user, service_name())
    let :unit_name, do: prefixed(:unit, service_name()) <> ".service"

    path :source_dir, root(:source), service_name()
    path :data_dir, root(:data), service_name()
    path :state_dir, root(:state), service_name()
    path :config_dir, root(:config), service_name()

    macro :standard_user do
      account service_user(), system: true, home: state_path("home")
    end
  end
end
```

```elixir
# infra/config.exs
use HostKit.DSL, providers: [HostKit.Providers.Caddy]
use ToysInfra

project :toys do
  toy_service :exograph do
    standard_user()

    systemd_service unit_name() do
      working_directory source_dir()
      read_write_paths [data_dir(), state_dir(), source_dir()]
    end
  end
end
```

## Runtime API

```elixir
{:ok, project} = HostKit.load("infra/config.exs", require: ["toys_infra.exs"])
{:ok, plan} = HostKit.plan(project)
#=> %HostKit.Plan{changes: [%HostKit.Change{action: :create, ...}]}

prod = HostKit.Target.ssh(:prod, host: "elixir.toys", user: "dannote", sudo: true)
{:ok, remote_plan} = HostKit.plan(project, target: prod, reader: HostKit.Remote)

HostKit.format_plan(plan)
{:ok, results} = HostKit.apply(plan, dry_run: true)

# Supported apply resources: accounts, directories, files, systemd services, and systemd timers.
{:ok, results} = HostKit.apply(plan, confirm: true, sudo: true)

# Command and filesystem operations are routed through a runner boundary.
{:ok, results} = HostKit.apply(plan, confirm: true, runner: HostKit.Runner.Local)

prod = HostKit.Target.ssh(:prod, host: "elixir.toys", user: "dannote", sudo: true)

{:ok, results} = HostKit.apply(plan, target: prod, confirm: true)

{:ok, conn} = HostKit.Runner.SSH.Connection.open(host: "elixir.toys", user: "dannote")
try do
  prod = HostKit.Target.ssh(:prod, runner: {HostKit.Runner.SSH.Connection, conn: conn}, sudo: true)
  {:ok, remote_plan} = HostKit.plan(project, target: prod, reader: HostKit.Remote)
after
  HostKit.Runner.SSH.Connection.close(conn)
end

{:ok, unit} = HostKit.Render.render(project, {:systemd_service, "toys-exograph.service"})
```

## Storage volumes

HostKit models storage as named metadata instead of repeated path strings:

```elixir
volume =
  HostKit.Storage.volume(:repositories,
    path: "/srv/toys/forgejo/repositories",
    owner: "toys-forgejo",
    group: "toys-forgejo",
    mode: 0o750,
    backup: true
  )

directory HostKit.Storage.directory(volume)
read_write_paths HostKit.Storage.read_write_paths([volume])
```

Service conventions can derive these paths without project-specific macros and later reuse the same volume metadata for systemd sandboxing, Unitctl transient runtimes, and backups.

```elixir
project :toys do
  roots data: "/srv/toys", config: "/etc/toys"
  prefixes user: "toys-", unit: "toys-"

  service :forgejo do
    storage :repositories, under: :data, path: "repositories", mode: 0o750, backup: true
    storage :config, under: :config, owner: "root", group: service_user(), writable: false, secret: true

    daemon unit_name() do
      run user: service_user(), read_write_paths: writable_storage_paths()
    end
  end
end
```

## HostKit agent

HostKit can run as a supervised OTP application. The supervision tree currently starts agent state and a monitor worker:

```elixir
HostKit.Agent.status()
HostKit.Agent.configure(project: project, target: HostKit.Target.local(:prod))
HostKit.Agent.run_plan()
HostKit.Agent.run_monitor()
```

HostKit can also declare its own outer systemd supervisor unit:

```elixir
HostKit.Agent.Systemd.service(
  exec_start: ["/opt/host_kit/bin/host_kit", "agent", "--config", "/etc/host_kit/config.exs"]
)
```

State snapshots can be written for audit/drift history:

```elixir
HostKit.State.write(plan, "/var/lib/host_kit/state/latest-plan.json")
HostKit.State.read("/var/lib/host_kit/state/latest-plan.json")
```

This gives a clean two-layer supervision model: OTP inside the BEAM and systemd outside it.

## Firewall policy

HostKit can declare project- or host-scoped firewall policy:

```elixir
firewall do
  allow tcp: 22, from: :any
  allow tcp: [80, 443], from: :any
  allow tcp: 9100, from: {10, 44, 0, 0, 24}
  deny :all
end
```

Host-scoped policy lives inside `host`:

```elixir
host :prod, hostname: "elixir.toys" do
  firewall do
    allow tcp: 22, from: :any
    deny :all
  end
end
```

Extract, render, plan, and apply policies with:

```elixir
HostKit.Firewall.policies(project)
HostKit.Firewall.Nftables.render(policy)
HostKit.plan(project, reader: HostKit.Local)
HostKit.apply(plan, confirm: true, nft_reload: true)
```

Firewall policy is written to `/etc/nftables.d/hostkit.nft` by default and validated with `nft -c -f` before optional reload.

## Workspace inside monitoring

Workspace services can declare checks that are intended to run inside the sandbox later via a workspace agent:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    inside do
      monitor :mix, task: "test", every: "5m"
      monitor :port, port: 4000
      monitor :git, clean: true
    end
  end
end
```

Extract them with:

```elixir
HostKit.Workspace.inside_monitors(project)
```

## Workspace execution and tenants

Tenants can own workspaces:

```elixir
tenant :alice, quota: [memory: "4G"] do
  agent port: 4173
end
```

Workspace command specs can be built for transient execution:

```elixir
HostKit.Workspace.exec_spec(project, :alice, :blog, ["mix", "test"])
HostKit.Workspace.exec(project, :alice, :blog, ["mix", "test"])
```

Inside monitors currently return `:pending_workspace_agent`, reserving execution for the sandbox agent boundary.

## OpenTelemetry Collector config

Telemetry declarations can be converted to an OpenTelemetry Collector config map:

```elixir
HostKit.OtelCollector.config(project, endpoint: "otel.example:4317")
```

## Workspace sandbox profiles

Systemd-backed sandbox profiles can be applied inside daemons:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    daemon unit_name() do
      run exec_start: ["mix", "phx.server"]
      sandbox :vibe_dev
    end
  end
end
```

Profiles include `:vibe_dev`, `:strict_app`, and `:untrusted`, and can be overridden:

```elixir
sandbox :untrusted,
  resources: [memory_max: "256M"],
  sandbox: [private_network: false]
```

## Workspace preview helper

Workspace services can expose a preview route with a named listener and Caddy site:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    daemon unit_name() do
      run exec_start: ["mix", "phx.server"]
    end

    preview :http, port: 4000, domain: "alice-blog.dev.example.com"
  end
end
```

This expands to `listen :http`, a Caddy reverse proxy to that listener, an HTTP monitor, telemetry metadata, and Caddy access-log metadata.

## Workspace agent helper

Workspaces can declare the default sandbox agent service as ordinary HostKit resources:

```elixir
workspace :blog, owner: :alice do
  agent port: 4173
end
```

This expands to a service with an account, workspace directory, systemd daemon, loopback listener, logs, telemetry, systemd monitor, and loopback-only network policy.

## Workspace scope

`workspace` scopes ordinary HostKit DSL for user sandboxes while keeping resources inspectable:

```elixir
workspace :blog, owner: :alice do
  service :preview do
    directory root_path(:data), mode: :private_dir

    daemon unit_name() do
      run exec_start: ["mix", "phx.server"]
      listen :http, port: 4000, on: :loopback
    end
  end
end
```

Inside a workspace, services get workspace metadata plus separate path and identity names:

```elixir
root_path(:data) # .../alice/blog/preview
unit_name()      # prefix-alice-blog-preview.service
```

## Named listeners

Services can declare named listeners and reuse them from provider declarations:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  listen :http, port: 3000, on: :loopback
end

caddy_site :web, "web.example.com" do
  reverse_proxy listener(:http)
end
```

Named listeners are stored as service metadata and render Caddy upstreams like `127.0.0.1:3000` at the provider boundary.

## Network addresses and policy

Network addresses can use Elixir tuple forms and semantic aliases:

```elixir
listen 3000, on: :loopback
listen 4000, on: {127, 0, 0, 1}
network_policy deny: :all, allow: [:loopback, {10, 44, 0, 0, 24}]
```

Systemd services compile network policy to:

```ini
IPAddressDeny=any
IPAddressAllow=localhost 10.44.0.0/24
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
```

## Log management intent

Log management can be declared globally, per service, or on individual resources:

```elixir
observability do
  logs driver: :journald,
       retention: "14d",
       ship: true,
       attributes: [deployment_environment: :prod]
end
```

Systemd service log declarations also add unit directives:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  logs identifier: service_name(), stdout: :journal, stderr: :journal
end
```

Extract log intent with:

```elixir
HostKit.Logs.configs(project)
```

Read recent journald logs through local or remote targets:

```elixir
HostKit.Logs.read("toys-forgejo.service", target: prod, since: "1h")
HostKit.Logs.tail("toys-forgejo.service", target: prod, lines: 100)
```

## OpenTelemetry collection intent

Observability defaults can be enabled once at project or service scope and inherited by resources:

```elixir
observability do
  telemetry logs: true,
            metrics: true,
            traces: false,
            attributes: [deployment_environment: :prod]
end
```

Resource-level overrides are still available:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  telemetry logs: :journald, metrics: false, service_name: service_name()
end
```

Extract collection intent with:

```elixir
HostKit.Telemetry.signals(project)
```

Systemd services and Caddy sites get default collection intent even without global defaults:

```elixir
# systemd: logs: :journald, metrics: :systemd
# caddy: logs: :access, metrics: :http
```

## Monitoring metadata

Declarations can carry monitoring intent for a later monitoring service or config generator:

```elixir
daemon unit_name() do
  run exec_start: ["/usr/bin/env", "true"]
  monitor :systemd, expect: [state: :active], severity: :critical
end

caddy_site :web, "web.example.com" do
  reverse_proxy "127.0.0.1:4000"
  monitor :http, url: "https://web.example.com", expect: [status: 200]
end
```

Extract or run checks with:

```elixir
HostKit.Monitor.checks(project)
HostKit.Monitor.run(project, target: prod)
```

Initial execution supports systemd state, HTTP status, and filesystem existence checks.

## File modes

Mode values can be raw octal, semantic aliases, tuples, keywords, or capability lists:

```elixir
mode: :secret_group_file
mode: {:rw, :r, nil}
mode: [owner: :rw, group: :r]
mode: [:setgid, :owner_rwx, :group_rwx, :other_rx]
```

Resources store normalized integer modes, so plan/apply remains simple.

## Env files and secrets

HostKit has a Dotenvy-validated env file resource. Secret values are resolved at apply time. Drift detection compares metadata and non-secret `set` entries; secret entry values are not read into plan artifacts for comparison.

```elixir
env_file root_path(:config, "env"), owner: "root", group: service_user(), mode: 0o640 do
  set :MIX_ENV, :prod
  set :PORT, 4000
  secret :SECRET_KEY_BASE, env: "SECRET_KEY_BASE"
end
```

## Runtime isolation

HostKit uses shared runtime isolation structs for persistent systemd units and future transient Unitctl workloads:

```elixir
sandbox = HostKit.Runtime.Sandbox.new(:strict_web)
resources = HostKit.Runtime.Resources.new(memory_max: "512M", cpu_quota: "50%")

service sandbox |> HostKit.Runtime.Sandbox.to_systemd_service_options()
service resources |> HostKit.Runtime.Resources.to_systemd_service_options()
```

Built-in profiles include `:web_service`, `:strict_web`, `:small`, `:medium`, and `:large`.

## Runtime controls

HostKit exposes Unitctl as its core transient runtime layer:

```elixir
{:ok, spec} =
  HostKit.Runtime.Spec.new(
    name: "demo-check",
    command: ["/usr/bin/env", "true"],
    sandbox: %{no_new_privileges: true, private_tmp: true}
  )

{:ok, instance} = HostKit.Runtime.start(spec)
{:ok, state} = HostKit.Runtime.status(instance)
:ok = HostKit.Runtime.stop(instance)
```

## Mix tasks

```sh
mix host_kit.dump --require toys_infra.exs infra/config.exs
mix host_kit.plan --require toys_infra.exs infra/config.exs
mix host_kit.plan --require toys_infra.exs infra/config.exs --local
mix host_kit.plan --require toys_infra.exs infra/config.exs --local --ignore systemd_service:toys-exograph.service
mix host_kit.plan --require toys_infra.exs infra/config.exs --remote elixir.toys --user dannote --sudo
mix host_kit.apply --require toys_infra.exs infra/config.exs --local --dry-run
mix host_kit.render --require toys_infra.exs infra/config.exs systemd_service toys-exograph.service
```