README.md

Select File
# Arcanum

Provider-agnostic AI inference library for Elixir.

## Overview

Arcanum provides a unified interface for chat completion, streaming, embeddings, tool use, and media generation across multiple AI providers. Model capabilities are declared upfront via profiles — no runtime detection or error-code fallbacks.

## Supported Providers

| Provider | API Format | Features |
|----------|-----------|----------|
| OpenAI | OpenAI | Chat, stream, tools, vision, image generation |
| Anthropic | Anthropic | Chat, stream, tools, vision |
| DeepSeek | OpenAI | Chat, stream, tools |
| GitHub Copilot | OpenAI | Chat, stream, tools, vision (OAuth device flow) |
| OpenRouter | OpenAI | Chat, stream, tools |
| ZAI / Zhipu | OpenAI | Chat, stream, tools |
| LM Studio | OpenAI | Chat, stream, tools (auto model loading) |

## Installation

```elixir
def deps do
  [
    {:arcanum, "~> 0.1.1"}
  ]
end
```

## Usage

All inference goes through `Arcanum.Gateway`. Callers never touch adapters directly.

### Provider Map

Every Gateway function takes a provider map describing the endpoint:

```elixir
provider = %{
  base_url: "https://api.openai.com",
  api_key: "sk-...",
  kind: "openai",
  api_format: :openai,
  type: :cloud
}
```

| Key | Type | Description |
|-----|------|-------------|
| `base_url` | `String.t()` | Required. Provider API base URL. |
| `api_key` | `String.t() \| nil` | API key. Not needed for local providers or Copilot. |
| `api_format` | `:openai \| :anthropic \| :custom` | Determines which adapter handles the request. |
| `kind` | `String.t()` | Provider ID (e.g. `"openai"`, `"anthropic"`, `"ollama"`, `"github-copilot"`). Used for profile resolution and provider-specific behavior. |
| `type` | `:cloud \| :local` | Used by `Arcanum.Probe` to skip TCP checks for cloud providers. |
| `extra_headers` | `[{String.t(), String.t()}] \| nil` | Additional HTTP headers (injected automatically for Copilot). |

### Chat Completion

```elixir
alias Arcanum.{Gateway, Intent}

intent = %Intent{
  model: "gpt-4o",
  messages: [
    %{role: :system, content: Intent.text("You are a helpful assistant.")},
    %{role: :user, content: Intent.text("What is Elixir?")}
  ],
  temperature: 0.7,
  max_tokens: 1024
}

{:ok, %Arcanum.Response{content: content}} = Gateway.chat(provider, intent)
```

### Streaming

```elixir
{:ok, stream} = Gateway.stream(provider, intent)

Enum.each(stream, fn
  {:data, %Arcanum.Response{content: chunk}} -> IO.write(chunk || "")
  :done -> IO.puts("\n--- done ---")
  {:error, reason} -> IO.puts("Error: #{inspect(reason)}")
end)
```

### Tool Use

Pass tools in the intent. Arcanum handles native, XML-text, and JSON-text tool call formats transparently based on the model profile.

```elixir
intent = %Intent{
  model: "gpt-4o",
  messages: [%{role: :user, content: Intent.text("What is the weather in Berlin?")}],
  tools: [
    %{
      type: "function",
      function: %{
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: %{
          "type" => "object",
          "properties" => %{
            "location" => %{"type" => "string", "description" => "City name"}
          },
          "required" => ["location"]
        }
      }
    }
  ]
}

{:ok, %Arcanum.Response{tool_calls: tool_calls}} = Gateway.chat(provider, intent)

# tool_calls is a list of:
# %{id: "call_abc", function: %{name: "get_weather", arguments: "{\"location\":\"Berlin\"}"}}
```

Models that don't support native tool calls (e.g. some Ollama models) automatically get XML-text or JSON-text extraction based on their profile's `tool_call_format`.

### Vision (Multimodal)

```elixir
intent = %Intent{
  model: "gpt-4o",
  messages: [
    %{role: :user, content: [
      %{type: :text, text: "What's in this image?"},
      %{type: :image_url, url: "https://example.com/photo.jpg"}
    ]}
  ]
}

# Or with base64:
%{type: :image_base64, media_type: "image/png", data: "iVBOR..."}
```

### Embeddings

```elixir
{:ok, embeddings} = Gateway.embed(provider, "gpt-4o", "Hello world")
# embeddings is a list of floats
```

Supported by OpenAI and Ollama adapters. Returns `{:error, :not_supported}` for adapters that don't override the default.

### Image Generation

```elixir
alias Arcanum.MediaIntent

media_intent = %MediaIntent{
  model: "gpt-image-1",
  prompt: "A cat wearing a wizard hat",
  size: "1024x1024",
  quality: "auto",
  n: 1,
  format: "png"
}

{:ok, %Arcanum.MediaResponse{items: items}} = Gateway.generate_image(provider, media_intent)

# Each item: %{data: binary(), url: nil, revised_prompt: "...", content_type: "image/png"}
```

### Video Generation

```elixir
{:ok, %Arcanum.MediaResponse{items: items}} = Gateway.generate_video(provider, media_intent)
```

Both `generate_image/3` and `generate_video/3` return `{:error, :not_supported}` for adapters that don't override the default implementation.

### List Models

```elixir
{:ok, models} = Gateway.list_models(provider)
# ["gpt-4o", "gpt-4o-mini", "gpt-4.1", ...]
```

### Probe Availability

```elixir
Arcanum.Probe.probe_provider(provider)
# :online | :offline
```

Cloud providers always return `:online`. Local providers get a TCP connect check (2s timeout).

### Ensure Model Loaded (LM Studio)

```elixir
:ok = Arcanum.EnsureModel.ensure_loaded(provider, "qwen2.5-coder", context_length: 32_768)
```

Pre-loads a model on LM Studio with the specified context length. No-op for all other providers.

### GitHub Copilot Authentication

```elixir
alias Arcanum.Auth.Copilot

# 1. Start device flow
{:ok, flow} = Copilot.start_device_flow()
# flow.verification_uri -> "https://github.com/login/device"
# flow.user_code -> "ABCD-1234"

# 2. User visits URL and enters code, then:
{:ok, access_token} = Copilot.poll_for_token(flow)

# 3. Use the token as the provider's api_key
provider = %{
  base_url: Copilot.base_url(),
  api_key: access_token,
  kind: "github-copilot",
  api_format: :openai,
  type: :cloud,
  extra_headers: Copilot.copilot_headers(access_token)
}
```

For non-blocking flows, use `Copilot.poll_once/1` for single-attempt polling (e.g. from an Oban job).

## Configuration

### Application Config

```elixir
# Required for GitHub Copilot OAuth
config :arcanum, copilot_client_id: "your-github-oauth-client-id"

# Optional: override HTTP client (defaults to Req)
config :arcanum, http_client: MyCustomClient
```

### Model Profile System

Every model gets a `ModelProfile` that declares its capabilities upfront. Profiles drive serialization, normalization, and feature gating — the adapter never guesses.

```elixir
%Arcanum.ModelProfile{
  supports_system_role:      true,       # can the model accept system messages?
  supports_tools:            true,       # native tool call support?
  supports_vision:           false,      # multimodal image input?
  supports_image_generation: false,      # image generation capability?
  supports_video_generation: false,      # video generation capability?
  tool_call_format:          :native,    # :native | :xml_text
  reasoning_field:           nil,        # atom — where the model puts thinking (e.g. :reasoning_content)
  thinking_param:            nil,        # map sent to provider to enable thinking (e.g. %{type: "enabled"})
  preserve_reasoning:        false,      # keep thinking content in response?
  max_context:               131_072,    # maximum context window
  max_images_per_message:    4,          # vision: max images per message
  max_outputs_per_request:   4,          # media generation: max outputs
  supported_sizes:           [],         # media generation: allowed dimensions
  supported_formats:         [],         # media generation: allowed formats
  provider_routing:          nil         # provider-specific routing metadata
}
```

### Profile Resolution

Profiles are resolved automatically by `Gateway` via `Arcanum.ModelProfile.Resolver`. Resolution follows a strict priority chain:

```
1. User overrides     (highest — caller-provided fields)
2. Overlay            (provider/model-specific, from priv/overlays.json)
3. Registry           (models.dev cache — single source of truth)
4. Provider default   (fallback for local providers not in models.dev)
5. Global default     (lowest — assumes weakest capabilities)
```

#### Registry (models.dev)

The `Arcanum.ModelProfile.Registry` GenServer fetches model capabilities from [models.dev](https://models.dev) and caches them in ETS. Refreshes hourly. Falls back gracefully if the fetch fails.

Default providers fetched: `openai`, `anthropic`, `deepseek`, `openrouter`, `xai`, `zai`, `zhipuai`, `github-copilot`, `lmstudio`.

```elixir
# Lookup a cached profile (returns nil if not found)
Arcanum.ModelProfile.Registry.lookup("openai", "gpt-4o")

# List all cached provider IDs
Arcanum.ModelProfile.Registry.cached_providers()
```

#### Overlays (`priv/overlays.json`)

Overlays patch capabilities that models.dev doesn't track (vision, image generation, reasoning params). They are compiled into the Resolver at build time.

```json
{
  "overlays": {
    "openai": {
      "gpt-4o": { "supports_vision": true },
      "gpt-image-1": {
        "supports_image_generation": true,
        "supported_sizes": ["1024x1024", "1024x1536", "1536x1024", "auto"],
        "supported_formats": ["png", "webp", "jpeg"],
        "max_outputs_per_request": 4
      }
    },
    "deepseek": {
      "deepseek-r1": { "preserve_reasoning": true }
    }
  },
  "provider_defaults": {
    "ollama": {
      "supports_system_role": true,
      "supports_tools": false,
      "tool_call_format": "xml_text",
      "max_context": 32768
    }
  }
}
```

#### Provider Defaults

For local providers not in models.dev (Ollama, LM Studio, vLLM), provider defaults from `priv/overlays.json` are used as the base profile. These assume conservative capabilities.

#### Profile Overrides

Callers can override any profile field at call time via the `:profile_overrides` option. Overrides take the highest priority in the resolution chain.

```elixir
# Force a model to use XML text tool calls
Gateway.chat(provider, intent, profile_overrides: %{tool_call_format: :xml_text})

# Override context window for a specific call
Gateway.chat(provider, intent, profile_overrides: %{max_context: 65_536})

# Enable vision for a model not in the registry
Gateway.chat(provider, intent, profile_overrides: %{supports_vision: true})

# Multiple overrides
Gateway.chat(provider, intent,
  profile_overrides: %{
    supports_tools: false,
    tool_call_format: :xml_text,
    max_context: 16_384
  }
)
```

Any field from `ModelProfile` can be overridden. The override map is merged on top of the resolved profile, so you only need to specify the fields you want to change.

### Gateway Options

All `Gateway.chat/3` and `Gateway.stream/3` calls accept an opts keyword list:

| Option | Type | Description |
|--------|------|-------------|
| `:profile_overrides` | `map()` | Override any `ModelProfile` fields for this call. |
| `:adapter` | `module()` | Override the adapter module (useful for testing). |

## Architecture

```
Gateway (single public entry point)
  -> Auth resolution (API key, Copilot OAuth headers)
  -> Profile resolution (Resolver: overrides > overlay > registry > provider default > global default)
  -> Adapter dispatch (OpenAI, Anthropic, Ollama)
  -> Response normalization (Normalizer: content fallback, think-tag stripping, tool-call extraction)
```

### Core Modules

| Module | Purpose |
|--------|---------|
| `Arcanum.Gateway` | Single entry point for all inference calls. |
| `Arcanum.Intent` | Canonical request struct. Content is always `[content_block()]`. |
| `Arcanum.Response` | Canonical response struct (content, thinking, tool_calls, usage). |
| `Arcanum.MediaIntent` | Request struct for image/video generation. |
| `Arcanum.MediaResponse` | Response struct for generated media (items with data/url). |
| `Arcanum.ModelProfile` | Declares model capabilities (tools, vision, reasoning, context). |
| `Arcanum.ModelProfile.Resolver` | Multi-layer profile resolution with override support. |
| `Arcanum.ModelProfile.Registry` | ETS cache backed by models.dev, refreshed hourly. |
| `Arcanum.Response.Normalizer` | Profile-driven post-processing (XML/JSON tool extraction, think tags). |
| `Arcanum.Provider` | Behaviour + macro (`use Arcanum.Provider`) with defoverridable defaults. |
| `Arcanum.Probe` | TCP availability check for local providers. |
| `Arcanum.EnsureModel` | Pre-loads models on LM Studio before inference. |
| `Arcanum.Auth.Copilot` | GitHub Copilot OAuth device code flow (RFC 8628). |

### Adapters

| Adapter | Behaviour Callbacks |
|---------|-------------------|
| `Arcanum.Adapters.OpenAI` | `chat`, `stream`, `list_models`, `embed`, `generate_image` |
| `Arcanum.Adapters.Anthropic` | `chat`, `stream`, `list_models` |
| `Arcanum.Adapters.Ollama` | `chat`, `stream`, `list_models`, `embed` |

### Error Handling

All Gateway functions return `{:ok, result}` or `{:error, reason}`. Error shapes:

| Error | Meaning |
|-------|---------|
| `{:error, {:api_error, status, body}}` | HTTP error from the provider. |
| `{:error, :context_overflow}` | Input exceeded the model's context window. |
| `{:error, :not_supported}` | Adapter doesn't implement the requested callback. |
| `{:error, :copilot_auth_required}` | Copilot provider needs OAuth authentication. |
| `{:error, term()}` | Network or other transient errors. |

Transient HTTP errors (429, 502, 503, 529) are retried automatically up to 3 times by the adapters.

## Design Principles

- **Profile-driven.** Model capabilities are declared upfront, never discovered via error codes.
- **Everything has a limit.** Retries, timeouts, model counts, poll attempts — all bounded.
- **Callers never touch adapters directly.** Gateway is the only public interface.
- **Two-layer separation.** Adapters handle wire protocol faithfully. Normalizer handles model-specific post-processing.

## License

MIT