README.md

# LlmCore

> Provider-agnostic LLM orchestration for Elixir. Route to any model, run agentic loops, extract structured output, and connect to [Hindsight](https://github.com/vectorize-io/hindsight) semantic memory — all through composable ALF pipelines with hot-reload TOML configuration.

LlmCore is the shared LLM substrate that powers the [Fosferon](https://github.com/fosferon) ecosystem. It handles the messy parts of working with LLMs — provider routing, CLI wrapping, structured extraction, tool-calling loops, and [Hindsight](https://github.com/vectorize-io/hindsight) semantic memory integration — so your application code stays clean.

## Why LlmCore?

- **One API, every provider.** Cloud APIs (Anthropic, OpenAI, Z.ai), local inference (Ollama, DGX Spark), and CLI tools (Claude Code, Gemini CLI, Codex, Droid, Kimi) all share the same `Provider` behaviour. Route by task type, fall back gracefully, add new providers without writing Elixir.

- **Config-driven CLI providers.** Adding a new CLI tool is a TOML block — no Elixir code needed. Declare the binary, flags, prompt transport, system prompt strategy, and output normalization. LlmCore handles the rest.

- **In-process agentic loops.** `LlmCore.Agent.Loop` runs tool-calling iterations inside the BEAM VM — no subprocess, no CLI overhead. Built-in circuit breaking detects stuck loops. Uses any API provider that supports tool use.

- **Hot-reload TOML configuration.** Change providers, routing rules, and memory settings without restarting. File watcher with debouncing keeps the runtime store (`ETS`) in sync with disk.

- **Structured output without the weight.** JSON-mode extraction and schema validation built in. No Instructor dependency. Custom validators via functions.

- **[Hindsight](https://github.com/vectorize-io/hindsight) semantic memory client.** Resilient integration with caching, circuit breaker, retry with backoff, and write buffering. Store once, recall by meaning.

- **Observable by default.** Every operation emits `:telemetry` events. Pipeline spans, provider dispatch, router decisions, memory operations — all instrumented.

## Installation

Add `llm_core` to your dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:llm_core, "~> 0.3"}
  ]
end
```

Then fetch dependencies:

```bash
mix deps.get
```

## Quick Start

### Send a prompt through the router

```elixir
# Routes automatically based on [routing.tasks] config
{:ok, response} = LlmCore.send("Explain pattern matching in Elixir", :reasoning)
IO.puts(response.content)
```

### Stream a response

```elixir
{:ok, stream} = LlmCore.stream("Write a GenServer example", :coding)
Enum.each(stream, fn chunk -> IO.write(chunk) end)
```

### Extract structured output

```elixir
schema = %{
  type: "object",
  properties: %{
    name: %{type: "string"},
    confidence: %{type: "number"}
  },
  required: ["name"]
}

{:ok, response} = LlmCore.send("Analyze this code", :reasoning,
  response_format: {:json_schema, schema}
)

response.structured
#=> %{"name" => "authenticate/2", "confidence" => 0.92}
```

### Run an agentic tool-calling loop

```elixir
alias LlmCore.Agent.Loop

tools = MyApp.Tools.available()
resolve = &MyApp.Tools.resolve/1

llm_send = fn messages, opts ->
  LlmCore.LLM.Provider.dispatch(LlmCore.LLM.Anthropic, messages, opts)
end

{:ok, response, messages} =
  Loop.run(
    [%{role: :user, content: "Research Elixir ALF"}],
    llm_send,
    tools: tools,
    resolve_tool: resolve,
    max_iterations: 10
  )
```

### Semantic memory (via [Hindsight](https://github.com/vectorize-io/hindsight))

LlmCore ships a resilient client for [Hindsight](https://github.com/vectorize-io/hindsight), a standalone semantic memory server. The client handles caching, circuit breaking, retry with backoff, and write buffering so your application code doesn't have to.

```elixir
# Store a fact (async, buffered)
:ok = LlmCore.retain("Schema-per-tenant isolation pattern", %{context: "architecture"})

# Recall by meaning
{:ok, results} = LlmCore.recall("how does multi-tenancy work?", bank_id: "my-bank")

# Synthesize an insight
{:ok, insight} = LlmCore.reflect("What patterns are most effective?", bank_id: "my-bank")
```

### Query available providers

```elixir
# All configured providers
providers = LlmCore.Provider.Registry.all()

# Only available ones (API keys present, binaries in PATH)
available = LlmCore.Provider.Registry.available()

# Find by alias
{:ok, provider} = LlmCore.Provider.Registry.lookup_alias("claude")

# Fuzzy suggestions (Jaro distance)
LlmCore.Provider.Registry.suggest_alias("claud")
#=> ["claude"]

# Capable providers for requirements
LlmCore.Provider.Registry.suggest_capable(%{streaming: true, tool_use: true})
```

### CLI provider discovery

```elixir
# List all CLI providers (built-in + configured)
entries = LlmCore.CLIProvider.Registry.list()

# Only those with binary in PATH
available = LlmCore.CLIProvider.Registry.available()

# Resolve by id or alias
{:ok, provider} = LlmCore.CLIProvider.Registry.resolve(:droid)

# Check capabilities
{:ok, caps} = LlmCore.CLIProvider.Registry.capabilities(:codex_cli)
```

## Configuration

LlmCore uses layered TOML configuration. Later sources override earlier ones:

```
1. Compiled defaults    (priv/config/llm_core.toml)
2. Global override      (~/.llm_core/config/llm_core.toml)
3. Project override     (<project>/.llm_core/llm_core.toml)
4. Environment variable (LLM_CORE_CONFIG=path)
5. Custom path          (explicit :path option)
6. Runtime overrides    (ETS, via mix tasks or API)
```

### Minimal configuration

```toml
[routing]
default = "claude"

[providers.anthropic]
module = "LlmCore.LLM.Anthropic"
aliases = ["claude"]

[providers.anthropic.auth]
api_key_env = "ANTHROPIC_API_KEY"
```

### Task-based routing

```toml
[routing]
default = "claude"

[routing.tasks.coding]
alias = "openai"
mode = "passthrough"
capabilities = { structured_output = true, tool_use = true }

[routing.tasks.planning]
alias = "claude"
mode = "abstracted"
capabilities = { reasoning = true }
```

### Add a CLI provider (no code needed)

```toml
[providers.my_tool]
type = "cli"
enabled = true
aliases = ["my-tool", "mt"]

[providers.my_tool.cli]
binary = "my-tool"
default_model = "v2"
default_timeout = 60000
prompt_position = "last"
install_hint = "pip install my-tool"
auto_approve_args = ["--yes"]

[providers.my_tool.cli.flags]
model = "--model"
temperature = "--temp"

[providers.my_tool.cli.preflight]
help_args = ["--help"]
expect_in_help = ["--model"]
```

### Mix task helpers

```bash
# Inspect configuration
mix llm_core.config.show
mix llm_core.config.show --section providers --json

# Edit configuration
mix llm_core.config.set --path routing.default.alias --value claude
mix llm_core.config.set --path telemetry.sample_rate --value 0.25 --type float

# Validate configuration
mix llm_core.config.validate
```

See the [Configuration Guide](docs/configuration.md) for the full TOML schema, environment variable interpolation, and agent registration rules.

## Architecture

LlmCore is built on [ALF](https://github.com/antonmi/alf) (Antonmi's Flow-based Framework) for composable, observable data pipelines:

```
┌─────────────────────────────────────────────────────────────┐
│                       LlmCore                                │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐ │
│  │  Inference   │  │   Routing    │  │   Hindsight        │ │
│  │  Pipeline    │  │   Pipeline   │  │   Memory Client    │ │
│  └──────────────┘  └──────────────┘  └────────────────────┘ │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐ │
│  │  Agent Loop  │  │   Config     │  │   Telemetry        │ │
│  │  (Tool Use)  │  │   (Hot TOML) │  │   (Observable)     │ │
│  └──────────────┘  └──────────────┘  └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

Three ALF pipelines handle the core flows:

- **Inference Pipeline** — normalize request → resolve route → check capabilities → dispatch provider → apply structured output → emit telemetry
- **Routing Pipeline** — parse task type → load routing config → match rules → resolve agent or apply fallback
- **Memory Pipeline** — route operation (retain/recall/reflect) → circuit breaker gate → retry with backoff → update cache

See the [Architecture Guide](docs/architecture.md) for pipeline internals, provider behaviour contracts, and the agent loop design.

## Telemetry Events

```elixir
# Provider dispatch
[:llm_core, :provider, :send, :start | :stop | :exception]
[:llm_core, :provider, :stream, :start | :chunk | :stop]

# Router decisions
[:llm_core, :router, :resolve, :start | :stop]
[:llm_core, :router, :fallback]

# Agent loop
[:llm_core, :agent, :complete]

# Memory operations
[:llm_core, :hindsight, :retain | :recall | :reflect]
[:llm_core, :hindsight, :circuit_breaker, :state_change]

# Configuration
[:llm_core, :config, :reload]
```

## Built-in Providers

| Provider | Type | Module | Key Capabilities |
|----------|------|--------|-----------------|
| Anthropic | API | `LlmCore.LLM.Anthropic` | Streaming, tool use, vision, structured output |
| OpenAI | API | `LlmCore.LLM.OpenAI` | Streaming, tool use, vision, structured output |
| Ollama | Local | `LlmCore.LLM.Ollama` | Streaming, JSON mode, local models |
| Appliance | Local | `LlmCore.LLM.Appliance` | OpenAI-compatible local endpoints |
| Native | API | `LlmCore.LLM.Native` | In-process agentic loop with cascade fallback |
| Claude Code | CLI | Config-driven | `--print`, system prompt file, auto-approve |
| Droid | CLI | Config-driven | `exec` subcommand, `--auto`, `--cwd` |
| Pi CLI | CLI | Config-driven | `--print`, `--provider`, `--thinking` |
| Kimi CLI | CLI | Config-driven | Agent-file YAML transform, final-message capture |
| Codex CLI | CLI | Config-driven | `--full-auto`, file capture, sandbox bypass |
| Gemini CLI | CLI | Config-driven | Model selection |

## Documentation

- [Configuration Guide](docs/configuration.md) — Full TOML schema, layered config, mix tasks
- [Architecture Guide](docs/architecture.md) — Pipeline design, provider system, memory integration
- [CLI Providers](docs/cli-providers.md) — Adding and configuring CLI-based providers
- [Agent Loop](docs/agent-loop.md) — Tool-calling loops, context, pipeline stages

## License

MIT — see the [LICENSE](https://github.com/fosferon/llm_core/blob/main/LICENSE) file.