# ADR-0017: Skill caching and hot-reload
## Status
Proposed
## Context
Current skill loading behavior:
1. **Cold loading** - Skills are loaded from disk on every `Conjure.load/1` call
2. **No caching** - Full body and resources re-read each time
3. **No change detection** - Registry doesn't know when skills change
4. **Manual reload** - Users must call `Conjure.Registry.reload/1` explicitly
For production deployments with many skills or frequent access patterns, this creates:
- **Latency** - Disk I/O on every skill access
- **Inconsistency** - Skills may change during a conversation
- **Operational burden** - Must restart or manually reload after updates
The specification mentions caching as a potential enhancement but doesn't specify the approach.
## Decision
We will implement optional skill caching and hot-reload as separate, composable features:
### 1. Skill Caching
Add caching layer for loaded skills and resources:
```elixir
defmodule Conjure.Cache do
@moduledoc """
ETS-based cache for loaded skills and resources.
"""
use GenServer
@table :conjure_cache
@default_ttl :timer.minutes(5)
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
@doc """
Get a cached skill or load and cache it.
"""
@spec get_or_load(Path.t(), keyword()) :: {:ok, Skill.t()} | {:error, term()}
def get_or_load(path, opts \\ []) do
ttl = Keyword.get(opts, :ttl, @default_ttl)
case lookup(path) do
{:ok, skill, inserted_at} when not expired?(inserted_at, ttl) ->
{:ok, skill}
_ ->
with {:ok, skill} <- Conjure.Loader.load_skill(path) do
insert(path, skill)
{:ok, skill}
end
end
end
@doc """
Get a cached resource or load and cache it.
"""
@spec get_or_load_resource(Skill.t(), Path.t()) :: {:ok, binary()} | {:error, term()}
def get_or_load_resource(skill, resource_path)
@doc """
Invalidate cached skill(s).
"""
@spec invalidate(Path.t() | :all) :: :ok
def invalidate(path_or_all)
@doc """
Get cache statistics.
"""
@spec stats() :: map()
def stats do
%{
size: :ets.info(@table, :size),
memory: :ets.info(@table, :memory),
hits: get_counter(:hits),
misses: get_counter(:misses)
}
end
# GenServer implementation...
end
```
**Cache Configuration:**
```elixir
config :conjure,
cache: [
enabled: true,
ttl: :timer.minutes(5),
max_size: 100, # Maximum cached skills
max_memory: :timer.megabytes(50) # Memory limit
]
```
### 2. Hot-Reload via File Watching
Optional file system watcher for automatic reload:
```elixir
defmodule Conjure.Watcher do
@moduledoc """
File system watcher for skill hot-reload.
Watches configured skill paths and triggers reload when
SKILL.md or resource files change.
"""
use GenServer
require Logger
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
@impl true
def init(opts) do
paths = Keyword.get(opts, :paths, Conjure.Config.skill_paths())
{:ok, watcher_pid} = FileSystem.start_link(dirs: paths)
FileSystem.subscribe(watcher_pid)
{:ok, %{watcher: watcher_pid, paths: paths, debounce: %{}}}
end
@impl true
def handle_info({:file_event, watcher, {path, events}}, state) do
if skill_file?(path) and relevant_event?(events) do
# Debounce rapid changes
skill_path = find_skill_root(path)
schedule_reload(skill_path, state)
else
{:noreply, state}
end
end
defp skill_file?(path) do
Path.basename(path) == "SKILL.md" or
String.contains?(path, "/scripts/") or
String.contains?(path, "/references/")
end
defp schedule_reload(skill_path, state) do
# Cancel existing timer if any
if timer = Map.get(state.debounce, skill_path) do
Process.cancel_timer(timer)
end
# Schedule reload after debounce period
timer = Process.send_after(self(), {:reload, skill_path}, 500)
{:noreply, put_in(state.debounce[skill_path], timer)}
end
@impl true
def handle_info({:reload, skill_path}, state) do
Logger.info("Hot-reloading skill: #{skill_path}")
# Invalidate cache
Conjure.Cache.invalidate(skill_path)
# Reload in registry if registered
if Conjure.Registry.registered?(skill_path) do
Conjure.Registry.reload_skill(skill_path)
end
# Emit telemetry
:telemetry.execute(
[:conjure, :skill, :reloaded],
%{},
%{path: skill_path}
)
{:noreply, Map.delete(state.debounce, skill_path)}
end
end
```
**Watcher Configuration:**
```elixir
config :conjure,
hot_reload: [
enabled: true, # false in production by default
debounce_ms: 500,
paths: [] # Additional paths beyond skill_paths
]
```
### 3. Registry Integration
Update Registry to work with cache and watcher:
```elixir
defmodule Conjure.Registry do
# Existing code...
@doc """
Check if a skill path is registered.
"""
@spec registered?(Path.t()) :: boolean()
def registered?(path)
@doc """
Reload a specific skill by path.
"""
@spec reload_skill(GenServer.server(), Path.t()) :: :ok | {:error, term()}
def reload_skill(server \\ __MODULE__, path) do
# Invalidate cache
Conjure.Cache.invalidate(path)
# Reload from disk
case Conjure.Loader.load_skill(path) do
{:ok, skill} ->
GenServer.call(server, {:update_skill, skill})
{:error, reason} ->
Logger.warning("Failed to reload skill #{path}: #{inspect(reason)}")
{:error, reason}
end
end
@doc """
Subscribe to skill change notifications.
"""
@spec subscribe() :: :ok
def subscribe do
Registry.register(Conjure.PubSub, :skill_changes, [])
end
end
```
### 4. Application Supervision Tree
```elixir
defmodule Conjure.Application do
use Application
def start(_type, _args) do
children = [
# Always start cache if enabled
cache_child_spec(),
# Start watcher in dev/configured environments
watcher_child_spec(),
# Registry (existing)
registry_child_spec(),
# PubSub for notifications
{Registry, keys: :duplicate, name: Conjure.PubSub}
]
|> Enum.reject(&is_nil/1)
Supervisor.start_link(children, strategy: :one_for_one)
end
defp cache_child_spec do
if Conjure.Config.get([:cache, :enabled], false) do
{Conjure.Cache, Conjure.Config.get(:cache, [])}
end
end
defp watcher_child_spec do
if Conjure.Config.get([:hot_reload, :enabled], false) do
{Conjure.Watcher, Conjure.Config.get(:hot_reload, [])}
end
end
end
```
### 5. Usage Examples
```elixir
# Production: caching enabled, hot-reload disabled
config :conjure,
cache: [enabled: true, ttl: :timer.hours(1)],
hot_reload: [enabled: false]
# Development: both enabled
config :conjure,
cache: [enabled: true, ttl: :timer.seconds(30)],
hot_reload: [enabled: true]
# Subscribe to changes in application code
Conjure.Registry.subscribe()
receive do
{:skill_changed, skill_name} ->
Logger.info("Skill #{skill_name} was updated")
end
```
## Consequences
### Positive
- **Improved performance** - Reduced disk I/O for frequently accessed skills
- **Developer experience** - Automatic reload during development
- **Observable** - Telemetry events for cache hits/misses and reloads
- **Configurable** - Fine-grained control over caching behavior
- **Composable** - Cache and watcher are independent features
### Negative
- **Memory usage** - Cached skills consume memory
- **Complexity** - More moving parts in the system
- **Dependencies** - FileSystem library for watching (optional)
- **Staleness risk** - Cached data may be stale if TTL too long
### Neutral
- **Optional features** - Both can be disabled entirely
- **ETS-based** - Uses proven Erlang technology
- **Debouncing** - Prevents reload storms during rapid edits
## Alternatives Considered
### Mnesia Instead of ETS
Use Mnesia for distributed caching. Rejected because:
- Over-engineering for single-node use case
- ETS is simpler and sufficient
- Can add distributed cache later if needed
### Polling Instead of Watching
Periodically check for file changes. Rejected because:
- Less responsive than inotify-based watching
- Wastes CPU cycles
- FileSystem library is well-maintained
### Always-On Caching
Make caching mandatory. Rejected because:
- Complicates testing
- Some deployments may prefer fresh reads
- Optional is more flexible
### In-Process Caching Only
Cache in Registry GenServer state. Rejected because:
- Would complicate Registry
- ETS provides better concurrent read access
- Separation of concerns
## References
- [ETS documentation](https://www.erlang.org/doc/man/ets.html)
- [FileSystem library](https://hexdocs.pm/file_system/)
- [ADR-0008: GenServer Registry](0008-genserver-registry.md)
- [Caching strategies in Elixir](https://elixirschool.com/en/lessons/storage/cachex)