docs/USER_GUIDE.md

Select File:
# ExLLM User Guide

This comprehensive guide covers all features and capabilities of the ExLLM library.

## Table of Contents

1. [Installation and Setup](#installation-and-setup)
2. [Configuration](#configuration)
3. [Basic Usage](#basic-usage)
4. [Providers](#providers)
5. [Chat Completions](#chat-completions)
6. [Streaming](#streaming)
7. [Session Management](#session-management)
8. [Context Management](#context-management)
9. [Function Calling](#function-calling)
10. [Vision and Multimodal](#vision-and-multimodal)
11. [Embeddings](#embeddings)
12. [Structured Outputs](#structured-outputs)
13. [Cost Tracking](#cost-tracking)
14. [Error Handling and Retries](#error-handling-and-retries)
15. [Caching](#caching)
16. [Response Caching](#response-caching)
17. [Model Discovery](#model-discovery)
18. [Provider Capabilities](#provider-capabilities)
19. [Logging](#logging)
20. [Testing with Mock Adapter](#testing-with-mock-adapter)
21. [Advanced Topics](#advanced-topics)

## Installation and Setup

### Adding to Your Project

Add ExLLM to your `mix.exs` dependencies:

```elixir
def deps do
  [
    {:ex_llm, "~> 0.4.1"},
    # Included dependencies (automatically installed with ex_llm):
    # - {:instructor, "~> 0.1.0"} - For structured outputs
    # - {:bumblebee, "~> 0.5"} - For local model inference
    # - {:nx, "~> 0.7"} - For numerical computing
    
    # Optional hardware acceleration (choose one):
    # {:exla, "~> 0.7"}  # For CUDA/ROCm GPUs
    # {:emlx, github: "elixir-nx/emlx", branch: "main"}  # For Apple Silicon
  ]
end
```

Run `mix deps.get` to install the dependencies.

### Optional Dependencies

- **Req**: HTTP client (automatically included)
- **Jason**: JSON parser (automatically included)
- **Instructor**: Structured outputs with schema validation (automatically included)
- **Bumblebee**: Local model inference (automatically included)
- **Nx**: Numerical computing (automatically included)
- **EXLA**: CUDA/ROCm GPU acceleration (optional)
- **EMLX**: Apple Silicon Metal acceleration (optional)

## Configuration

ExLLM supports multiple configuration methods to suit different use cases.

### Environment Variables

The simplest way to configure ExLLM:

```bash
# OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_API_BASE="https://api.openai.com/v1"  # Optional custom endpoint

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# Google Gemini
export GOOGLE_API_KEY="..."
# or
export GEMINI_API_KEY="..."

# Groq
export GROQ_API_KEY="gsk_..."

# OpenRouter
export OPENROUTER_API_KEY="sk-or-..."

# X.AI
export XAI_API_KEY="xai-..."

# Mistral AI
export MISTRAL_API_KEY="..."

# Perplexity
export PERPLEXITY_API_KEY="pplx-..."

# AWS Bedrock
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Ollama
export OLLAMA_API_BASE="http://localhost:11434"

# LM Studio
export LMSTUDIO_API_BASE="http://localhost:1234"
```

### Static Configuration

For more control, use static configuration:

```elixir
config = %{
  openai: %{
    api_key: "sk-...",
    api_base: "https://api.openai.com/v1",
    default_model: "gpt-4o"
  },
  anthropic: %{
    api_key: "sk-ant-...",
    default_model: "claude-3-5-sonnet-20241022"
  }
}

{:ok, provider} = ExLLM.ConfigProvider.Static.start_link(config)

# Use with config_provider option
{:ok, response} = ExLLM.chat(:openai, messages, config_provider: provider)
```

### Custom Configuration Provider

Implement your own configuration provider:

```elixir
defmodule MyApp.ConfigProvider do
  @behaviour ExLLM.ConfigProvider
  
  def get([:openai, :api_key]), do: fetch_from_vault("openai_key")
  def get([:anthropic, :api_key]), do: fetch_from_vault("anthropic_key")
  def get(_path), do: nil
  
  def get_all() do
    %{
      openai: %{api_key: fetch_from_vault("openai_key")},
      anthropic: %{api_key: fetch_from_vault("anthropic_key")}
    }
  end
end

# Use it
{:ok, response} = ExLLM.chat(:openai, messages, 
  config_provider: MyApp.ConfigProvider
)
```

## Basic Usage

### Simple Chat

```elixir
messages = [
  %{role: "user", content: "Hello, how are you?"}
]

{:ok, response} = ExLLM.chat(:openai, messages)
IO.puts(response.content)
```

### Provider/Model Syntax

```elixir
# Use provider/model string syntax
{:ok, response} = ExLLM.chat("anthropic/claude-3-haiku-20240307", messages)

# Equivalent to
{:ok, response} = ExLLM.chat(:anthropic, messages, 
  model: "claude-3-haiku-20240307"
)
```

### Response Structure

```elixir
%ExLLM.Types.LLMResponse{
  content: "I'm doing well, thank you!",
  model: "gpt-4o",
  finish_reason: "stop",
  usage: %{
    input_tokens: 12,
    output_tokens: 8,
    total_tokens: 20
  },
  cost: %{
    input_cost: 0.00006,
    output_cost: 0.00016,
    total_cost: 0.00022,
    currency: "USD"
  }
}
```

## Providers

### Supported Providers

ExLLM supports these providers out of the box:

- **:openai** - OpenAI GPT models
- **:anthropic** - Anthropic Claude models
- **:gemini** - Google Gemini models
- **:groq** - Groq fast inference
- **:mistral** - Mistral AI models
- **:perplexity** - Perplexity search-enhanced models
- **:ollama** - Local models via Ollama
- **:lmstudio** - Local models via LM Studio
- **:bedrock** - AWS Bedrock
- **:openrouter** - OpenRouter (300+ models)
- **:xai** - X.AI Grok models
- **:bumblebee** - Local models via Bumblebee/NX
- **:mock** - Mock adapter for testing

### Checking Provider Configuration

```elixir
# Check if a provider is configured
if ExLLM.configured?(:openai) do
  {:ok, response} = ExLLM.chat(:openai, messages)
end

# Get default model for a provider
model = ExLLM.default_model(:anthropic)
# => "claude-3-5-sonnet-20241022"

# List available models
{:ok, models} = ExLLM.list_models(:openai)
for model <- models do
  IO.puts("#{model.id}: #{model.context_window} tokens")
end
```

## Chat Completions

### Basic Options

```elixir
{:ok, response} = ExLLM.chat(:openai, messages,
  model: "gpt-4o",           # Specific model
  temperature: 0.7,          # 0.0-1.0, higher = more creative
  max_tokens: 1000,          # Max response length
  top_p: 0.9,                # Nucleus sampling
  frequency_penalty: 0.5,    # Reduce repetition
  presence_penalty: 0.5,     # Encourage new topics
  stop: ["\n\n", "END"],     # Stop sequences
  seed: 12345,               # Reproducible outputs
  timeout: 60_000            # Request timeout in ms (default: provider-specific)
)
```

### Timeout Configuration

Different providers have different timeout requirements. ExLLM allows you to configure timeouts per request:

```elixir
# Ollama with function calling (can be slow)
{:ok, response} = ExLLM.chat(:ollama, messages,
  functions: functions,
  timeout: 300_000  # 5 minutes
)

# Quick requests with shorter timeout
{:ok, response} = ExLLM.chat(:openai, messages,
  timeout: 30_000   # 30 seconds
)
```

Default timeouts:
- **Ollama**: 120,000ms (2 minutes) - Local models can be slower
- **Other providers**: Use their HTTP client defaults (typically 30-60 seconds)

### System Messages

```elixir
messages = [
  %{role: "system", content: "You are a helpful coding assistant."},
  %{role: "user", content: "How do I read a file in Elixir?"}
]

{:ok, response} = ExLLM.chat(:openai, messages)
```

### Multi-turn Conversations

```elixir
conversation = [
  %{role: "user", content: "What's the capital of France?"},
  %{role: "assistant", content: "The capital of France is Paris."},
  %{role: "user", content: "What's the population?"}
]

{:ok, response} = ExLLM.chat(:openai, conversation)
```

## Streaming

### Basic Streaming

```elixir
{:ok, stream} = ExLLM.stream_chat(:openai, messages)

for chunk <- stream do
  case chunk do
    %{content: content} when content != nil ->
      IO.write(content)
      
    %{finish_reason: reason} when reason != nil ->
      IO.puts("\nFinished: #{reason}")
      
    _ ->
      # Other chunk types (role, etc.)
      :ok
  end
end
```

### Streaming with Callback

```elixir
{:ok, stream} = ExLLM.stream_chat(:openai, messages,
  on_chunk: fn chunk ->
    if chunk.content, do: IO.write(chunk.content)
  end
)

# Consume the stream
Enum.to_list(stream)
```

### Collecting Streamed Response

```elixir
{:ok, stream} = ExLLM.stream_chat(:openai, messages)

# Collect all chunks into a single response
full_content = 
  stream
  |> Enum.map(& &1.content)
  |> Enum.reject(&is_nil/1)
  |> Enum.join("")
```

### Stream Recovery

Enable automatic stream recovery for interrupted streams:

```elixir
{:ok, stream} = ExLLM.stream_chat(:openai, messages,
  stream_recovery: true,
  recovery_strategy: :exact  # :exact, :paragraph, or :summarize
)

# If stream is interrupted, you can resume
{:ok, resumed_stream} = ExLLM.resume_stream(recovery_id)
```

## Session Management

Sessions provide stateful conversation management with automatic token tracking.

### Creating and Using Sessions

```elixir
# Create a new session
session = ExLLM.new_session(:openai, name: "Customer Support")

# Chat with session (automatically manages message history)
{:ok, {response, session}} = ExLLM.chat_with_session(
  session,
  "What's the weather like?"
)

# Continue the conversation
{:ok, {response2, session}} = ExLLM.chat_with_session(
  session,
  "What should I wear?"
)

# Check token usage
total_tokens = ExLLM.session_token_usage(session)
IO.puts("Total tokens used: #{total_tokens}")
```

### Managing Session Messages

```elixir
# Add messages manually
session = ExLLM.add_session_message(session, "user", "Hello!")
session = ExLLM.add_session_message(session, "assistant", "Hi there!")

# Get message history
messages = ExLLM.get_session_messages(session)
recent_10 = ExLLM.get_session_messages(session, 10)

# Clear messages but keep session metadata
session = ExLLM.clear_session(session)
```

### Persisting Sessions

```elixir
# Save session to JSON
{:ok, json} = ExLLM.save_session(session)
File.write!("session.json", json)

# Load session from JSON
{:ok, json} = File.read("session.json")
{:ok, restored_session} = ExLLM.load_session(json)
```

### Session with Context

```elixir
# Create session with default context
session = ExLLM.new_session(:openai,
  name: "Tech Support",
  context: %{
    temperature: 0.3,
    system_message: "You are a technical support agent."
  }
)

# Context is automatically applied to all chats
{:ok, {response, session}} = ExLLM.chat_with_session(session, "Help!")
```

## Context Management

Automatically manage conversation context to fit within model limits.

### Context Window Validation

```elixir
# Check if messages fit in context window
case ExLLM.validate_context(messages, provider: :openai, model: "gpt-4") do
  {:ok, token_count} ->
    IO.puts("Messages use #{token_count} tokens")
    
  {:error, reason} ->
    IO.puts("Messages too large: #{reason}")
end

# Get context window size for a model
window_size = ExLLM.context_window_size(:anthropic, "claude-3-opus-20240229")
# => 200000
```

### Automatic Message Truncation

```elixir
# Prepare messages to fit in context window
truncated = ExLLM.prepare_messages(long_conversation,
  provider: :openai,
  model: "gpt-4",
  max_tokens: 4000,           # Reserve tokens for response
  strategy: :sliding_window,   # or :smart
  preserve_messages: 5         # Always keep last 5 messages
)
```

### Truncation Strategies

1. **:sliding_window** - Keep most recent messages
2. **:smart** - Preserve system messages and recent context

```elixir
# Smart truncation preserves important context
{:ok, response} = ExLLM.chat(:openai, very_long_conversation,
  strategy: :smart,
  preserve_messages: 10
)
```

### Context Statistics

```elixir
stats = ExLLM.context_stats(messages)
# => %{
#   message_count: 20,
#   total_tokens: 1500,
#   by_role: %{"user" => 10, "assistant" => 9, "system" => 1},
#   avg_tokens_per_message: 75
# }
```

## Function Calling

Enable AI models to call functions/tools in your application.

### Basic Function Calling

```elixir
# Define available functions
functions = [
  %{
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: %{
      type: "object",
      properties: %{
        location: %{
          type: "string",
          description: "City and state, e.g. San Francisco, CA"
        },
        unit: %{
          type: "string",
          enum: ["celsius", "fahrenheit"],
          description: "Temperature unit"
        }
      },
      required: ["location"]
    }
  }
]

# Let the AI decide when to call functions
{:ok, response} = ExLLM.chat(:openai, 
  [%{role: "user", content: "What's the weather in NYC?"}],
  functions: functions,
  function_call: "auto"  # or "none" or %{name: "get_weather"}
)
```

### Handling Function Calls

```elixir
# Parse function calls from response
case ExLLM.parse_function_calls(response, :openai) do
  {:ok, [function_call | _]} ->
    # AI wants to call a function
    IO.inspect(function_call)
    # => %ExLLM.FunctionCalling.FunctionCall{
    #      name: "get_weather",
    #      arguments: %{"location" => "New York, NY"}
    #    }
    
    # Execute the function
    result = get_weather_impl(function_call.arguments["location"])
    
    # Format result for conversation
    function_message = ExLLM.format_function_result(
      %ExLLM.FunctionCalling.FunctionResult{
        name: "get_weather",
        result: result
      },
      :openai
    )
    
    # Continue conversation with function result
    messages = messages ++ [response_message, function_message]
    {:ok, final_response} = ExLLM.chat(:openai, messages)
    
  {:ok, []} ->
    # No function call, regular response
    IO.puts(response.content)
end
```

### Function Execution

```elixir
# Define functions with handlers
functions_with_handlers = [
  %{
    name: "calculate",
    description: "Perform mathematical calculations",
    parameters: %{
      type: "object",
      properties: %{
        expression: %{type: "string"}
      },
      required: ["expression"]
    },
    handler: fn args ->
      # Your implementation
      {result, _} = Code.eval_string(args["expression"])
      %{result: result}
    end
  }
]

# Execute function automatically
{:ok, result} = ExLLM.execute_function(function_call, functions_with_handlers)
```

### Provider-Specific Notes

Different providers use different terminology:
- OpenAI: "functions" and "function_call"
- Anthropic: "tools" and "tool_use"
- ExLLM normalizes these automatically

## Vision and Multimodal

Work with images and other media types.

### Basic Image Analysis

```elixir
# Create a vision message
{:ok, message} = ExLLM.vision_message(
  "What's in this image?",
  ["path/to/image.jpg"]
)

# Send to vision-capable model
{:ok, response} = ExLLM.chat(:openai, [message],
  model: "gpt-4o"  # or any vision model
)
```

### Multiple Images

```elixir
{:ok, message} = ExLLM.vision_message(
  "Compare these images",
  [
    "image1.jpg",
    "image2.jpg",
    "https://example.com/image3.png"  # URLs work too
  ],
  detail: :high  # :low, :high, or :auto
)
```

### Loading Images

```elixir
# Load image with options
{:ok, image_part} = ExLLM.load_image("photo.jpg",
  detail: :high,
  resize: {1024, 1024}  # Optional resizing
)

# Build custom message
message = %{
  role: "user",
  content: [
    %{type: "text", text: "Describe this image"},
    image_part
  ]
}
```

### Checking Vision Support

```elixir
# Check if provider/model supports vision
if ExLLM.supports_vision?(:anthropic, "claude-3-opus-20240229") do
  # This model supports vision
end

# Find all vision-capable models
vision_models = ExLLM.find_models_with_features([:vision])
```

### Text Extraction from Images

```elixir
# OCR-like functionality
{:ok, text} = ExLLM.extract_text_from_image(:openai, "document.png",
  model: "gpt-4o",
  prompt: "Extract all text, preserving formatting and layout"
)
```

### Image Analysis

```elixir
# Analyze multiple images
{:ok, analysis} = ExLLM.analyze_images(:anthropic,
  ["chart1.png", "chart2.png"],
  "Compare these charts and identify trends",
  model: "claude-3-5-sonnet-20241022"
)
```

## Embeddings

Generate vector embeddings for semantic search and similarity.

### Basic Embeddings

```elixir
# Generate embeddings for text
{:ok, response} = ExLLM.embeddings(:openai, 
  ["Hello world", "Goodbye world"]
)

# Response structure
%ExLLM.Types.EmbeddingResponse{
  embeddings: [
    [0.0123, -0.0456, ...],  # 1536 dimensions for text-embedding-3-small
    [0.0789, -0.0234, ...]
  ],
  model: "text-embedding-3-small",
  usage: %{total_tokens: 8}
}
```

### Embedding Options

```elixir
{:ok, response} = ExLLM.embeddings(:openai, texts,
  model: "text-embedding-3-large",
  dimensions: 256,  # Reduce dimensions (model-specific)
  encoding_format: "float"  # or "base64"
)
```

### Similarity Search

```elixir
# Calculate similarity between embeddings
similarity = ExLLM.cosine_similarity(embedding1, embedding2)
# => 0.87 (1.0 = identical, 0.0 = orthogonal, -1.0 = opposite)

# Find similar items
query_embedding = get_embedding("search query")
items = [
  %{id: 1, text: "Document 1", embedding: [...]},
  %{id: 2, text: "Document 2", embedding: [...]},
  # ...
]

results = ExLLM.find_similar(query_embedding, items,
  top_k: 10,
  threshold: 0.7  # Minimum similarity
)
# => [
#   %{item: %{id: 2, ...}, similarity: 0.92},
#   %{item: %{id: 5, ...}, similarity: 0.85},
#   ...
# ]
```

### Listing Embedding Models

```elixir
{:ok, models} = ExLLM.list_embedding_models(:openai)
for model <- models do
  IO.puts("#{model.name}: #{model.dimensions} dimensions")
end
```

### Caching Embeddings

```elixir
# Enable caching for embeddings
{:ok, response} = ExLLM.embeddings(:openai, texts,
  cache: true,
  cache_ttl: :timer.hours(24)
)
```

## Structured Outputs

Generate structured data with schema validation using Instructor integration.

### Basic Structured Output

```elixir
defmodule EmailClassification do
  use Ecto.Schema
  
  embedded_schema do
    field :category, Ecto.Enum, values: [:personal, :work, :spam]
    field :priority, Ecto.Enum, values: [:high, :medium, :low]
    field :summary, :string
  end
end

{:ok, result} = ExLLM.chat(:openai, 
  [%{role: "user", content: "Classify this email: Meeting tomorrow at 3pm"}],
  response_model: EmailClassification,
  max_retries: 3  # Retry on validation failure
)

IO.inspect(result)
# => %EmailClassification{
#      category: :work,
#      priority: :high,
#      summary: "Meeting scheduled for tomorrow"
#    }
```

### Complex Schemas

```elixir
defmodule ProductExtraction do
  use Ecto.Schema
  
  embedded_schema do
    field :name, :string
    field :price, :decimal
    field :currency, :string
    field :in_stock, :boolean
    
    embeds_many :features, Feature do
      field :name, :string
      field :value, :string
    end
  end
  
  def changeset(struct, params) do
    struct
    |> Ecto.Changeset.cast(params, [:name, :price, :currency, :in_stock])
    |> Ecto.Changeset.cast_embed(:features)
    |> Ecto.Changeset.validate_required([:name, :price])
    |> Ecto.Changeset.validate_number(:price, greater_than: 0)
  end
end

{:ok, product} = ExLLM.chat(:anthropic,
  [%{role: "user", content: "Extract product info from: iPhone 15 Pro, $999, 256GB storage, A17 chip"}],
  response_model: ProductExtraction
)
```

### Lists and Collections

```elixir
defmodule TodoList do
  use Ecto.Schema
  
  embedded_schema do
    embeds_many :todos, Todo do
      field :task, :string
      field :priority, Ecto.Enum, values: [:high, :medium, :low]
      field :completed, :boolean, default: false
    end
  end
end

{:ok, todo_list} = ExLLM.chat(:openai,
  [%{role: "user", content: "Create a todo list for launching a new feature"}],
  response_model: TodoList
)
```

## Cost Tracking

ExLLM automatically tracks API costs for all operations.

### Automatic Cost Tracking

```elixir
{:ok, response} = ExLLM.chat(:openai, messages)

# Cost is included in response
IO.inspect(response.cost)
# => %{
#   input_cost: 0.00003,
#   output_cost: 0.00006,
#   total_cost: 0.00009,
#   currency: "USD"
# }

# Format for display
IO.puts(ExLLM.format_cost(response.cost.total_cost))
# => "$0.009¢"
```

### Manual Cost Calculation

```elixir
usage = %{input_tokens: 1000, output_tokens: 500}
cost = ExLLM.calculate_cost(:openai, "gpt-4", usage)
# => %{
#   input_cost: 0.03,
#   output_cost: 0.06,
#   total_cost: 0.09,
#   currency: "USD",
#   per_million_input: 30.0,
#   per_million_output: 120.0
# }
```

### Token Estimation

```elixir
# Estimate tokens for text
tokens = ExLLM.estimate_tokens("Hello, world!")
# => 4

# Estimate for messages
tokens = ExLLM.estimate_tokens([
  %{role: "user", content: "Hi"},
  %{role: "assistant", content: "Hello!"}
])
# => 12
```

### Disabling Cost Tracking

```elixir
{:ok, response} = ExLLM.chat(:openai, messages,
  track_cost: false
)
# response.cost will be nil
```

## Error Handling and Retries

### Automatic Retries

Retries are enabled by default with exponential backoff:

```elixir
{:ok, response} = ExLLM.chat(:openai, messages,
  retry: true,           # Default: true
  retry_count: 3,        # Default: 3 attempts
  retry_delay: 1000,     # Default: 1 second initial delay
  retry_backoff: :exponential,  # or :linear
  retry_jitter: true     # Add randomness to prevent thundering herd
)
```

### Error Types

```elixir
case ExLLM.chat(:openai, messages) do
  {:ok, response} ->
    IO.puts(response.content)
    
  {:error, %ExLLM.Error{type: :rate_limit} = error} ->
    IO.puts("Rate limited. Retry after: #{error.retry_after}")
    
  {:error, %ExLLM.Error{type: :invalid_api_key}} ->
    IO.puts("Check your API key configuration")
    
  {:error, %ExLLM.Error{type: :context_length_exceeded}} ->
    IO.puts("Message too long for model")
    
  {:error, %ExLLM.Error{type: :timeout}} ->
    IO.puts("Request timed out")
    
  {:error, error} ->
    IO.inspect(error)
end
```

### Custom Retry Logic

```elixir
defmodule MyApp.RetryHandler do
  def with_custom_retry(provider, messages, opts \\ []) do
    Enum.reduce_while(1..5, nil, fn attempt, _acc ->
      case ExLLM.chat(provider, messages, Keyword.put(opts, :retry, false)) do
        {:ok, response} ->
          {:halt, {:ok, response}}
          
        {:error, %{type: :rate_limit} = error} ->
          wait_time = error[:retry_after] || :timer.seconds(attempt * 10)
          Process.sleep(wait_time)
          {:cont, nil}
          
        {:error, _} = error ->
          if attempt == 5 do
            {:halt, error}
          else
            Process.sleep(:timer.seconds(attempt))
            {:cont, nil}
          end
      end
    end)
  end
end
```

## Caching

Cache responses to reduce API calls and costs.

### Basic Caching

```elixir
# Enable caching globally
Application.put_env(:ex_llm, :cache_enabled, true)

# Or per request
{:ok, response} = ExLLM.chat(:openai, messages,
  cache: true,
  cache_ttl: :timer.minutes(15)  # Default: 15 minutes
)

# Same request will use cache
{:ok, cached_response} = ExLLM.chat(:openai, messages, cache: true)
```

### Cache Management

```elixir
# Clear specific cache entry
ExLLM.Cache.delete(cache_key)

# Clear all cache
ExLLM.Cache.clear()

# Get cache stats
stats = ExLLM.Cache.stats()
# => %{size: 42, hits: 100, misses: 20}
```

### Custom Cache Keys

```elixir
# Cache key is automatically generated from:
# - Provider
# - Messages
# - Relevant options (model, temperature, etc.)

# You can also use manual cache management
cache_key = ExLLM.Cache.generate_cache_key(:openai, messages, options)
```

## Response Caching

Cache real provider responses for offline testing and development cost reduction.

ExLLM provides two approaches for response caching:

1. **Unified Cache System** (Recommended) - Extends the runtime cache with optional disk persistence
2. **Legacy Response Cache** - Standalone response collection system

### Unified Cache System (Recommended)

The unified cache system extends ExLLM's runtime performance cache with optional disk persistence. This provides both speed benefits and testing capabilities from a single system.

#### Enabling Unified Cache Persistence

```elixir
# Method 1: Environment variables (temporary)
export EX_LLM_CACHE_PERSIST=true
export EX_LLM_CACHE_DIR="/path/to/cache"  # Optional

# Method 2: Runtime configuration (recommended for tests)
ExLLM.Cache.configure_disk_persistence(true, "/path/to/cache")

# Method 3: Application configuration
config :ex_llm,
  cache_persist_disk: true,
  cache_disk_path: "/tmp/ex_llm_cache"
```

#### Automatic Response Collection with Unified Cache

When persistence is enabled, all cached responses are automatically stored to disk:

```elixir
# Normal caching usage - responses automatically persist to disk when enabled
{:ok, response} = ExLLM.chat(messages, provider: :openai, cache: true)
{:ok, response} = ExLLM.chat(messages, provider: :anthropic, cache: true)
```

#### Benefits of Unified Cache System

- **Zero performance impact** when persistence is disabled (default)
- **Single configuration** controls both runtime cache and disk persistence  
- **Natural development workflow** - enable during development, disable in production
- **Automatic mock integration** - cached responses work seamlessly with Mock adapter

### Legacy Response Cache System

For compatibility, the original response cache system is still available:

#### Enabling Legacy Response Caching

```elixir
# Enable response caching via environment variables
export EX_LLM_CACHE_RESPONSES=true
export EX_LLM_CACHE_DIR="/path/to/cache"  # Optional: defaults to /tmp/ex_llm_cache
```

### Automatic Response Collection

When caching is enabled, all provider responses are automatically stored:

```elixir
# Normal usage - responses are automatically cached
{:ok, response} = ExLLM.chat(messages, provider: :openai)
{:ok, stream} = ExLLM.stream_chat(messages, provider: :anthropic)
```

### Cache Structure

Responses are organized by provider and endpoint:

```
/tmp/ex_llm_cache/
├── openai/
│   ├── chat.json           # Chat completions
│   └── streaming.json      # Streaming responses
├── anthropic/
│   ├── chat.json           # Claude messages
│   └── streaming.json      # Streaming responses
└── openrouter/
    └── chat.json           # OpenRouter responses
```

### Manual Response Storage

```elixir
# Store a specific response
ExLLM.ResponseCache.store_response(
  "openai",                    # Provider
  "chat",                      # Endpoint
  %{messages: messages},       # Request data
  %{"choices" => [...]}        # Response data
)
```

### Mock Adapter Integration

Configure the Mock adapter to replay cached responses from any provider:

#### Using Unified Cache System

With the unified cache system, responses are automatically available for mock testing when disk persistence is enabled:

```elixir
# 1. Enable disk persistence during development/testing
ExLLM.Cache.configure_disk_persistence(true, "/tmp/ex_llm_cache")

# 2. Use normal caching to collect responses
{:ok, response} = ExLLM.chat(:openai, messages, cache: true)
{:ok, response} = ExLLM.chat(:anthropic, messages, cache: true)

# 3. Configure mock adapter to use cached responses
ExLLM.ResponseCache.configure_mock_provider(:openai)

# 4. Mock calls now return authentic cached responses
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Returns real OpenAI response structure and content

# 5. Switch to different provider responses
ExLLM.ResponseCache.configure_mock_provider(:anthropic)
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Now returns real Anthropic response structure
```

#### Using Legacy Response Cache

For compatibility with the original caching approach:

```elixir
# Enable legacy response caching
export EX_LLM_CACHE_RESPONSES=true

# Use cached OpenAI responses for realistic testing
ExLLM.ResponseCache.configure_mock_provider(:openai)

# Now mock calls return authentic OpenAI responses
{:ok, response} = ExLLM.chat(messages, provider: :mock)
# Returns real OpenAI response structure and content
```

### Response Collection for Testing

Collect comprehensive test scenarios:

```elixir
# Collect responses for common test cases
ExLLM.CachingInterceptor.create_test_collection(:openai)

# Collect specific scenarios
test_cases = [
  {[%{role: "user", content: "Hello"}], []},
  {[%{role: "user", content: "What is 2+2?"}], [max_tokens: 10]},
  {[%{role: "user", content: "Tell me a joke"}], [temperature: 0.8]}
]

ExLLM.CachingInterceptor.collect_test_responses(:anthropic, test_cases)
```

### Cache Management

```elixir
# List available cached providers
providers = ExLLM.ResponseCache.list_cached_providers()
# => [{"openai", 15}, {"anthropic", 8}]  # {provider, response_count}

# Clear cache for specific provider
ExLLM.ResponseCache.clear_provider_cache("openai")

# Clear all cached responses
ExLLM.ResponseCache.clear_all_cache()

# Get specific cached response
cached = ExLLM.ResponseCache.get_response("openai", "chat", request_data)
```

### Configuration Options

```elixir
# Environment variables
EX_LLM_CACHE_RESPONSES=true      # Enable/disable caching
EX_LLM_CACHE_DIR="/custom/path"  # Custom cache directory

# Check if caching is enabled
ExLLM.ResponseCache.caching_enabled?()
# => true

# Get current cache directory
ExLLM.ResponseCache.cache_dir()
# => "/tmp/ex_llm_cache"
```

### Use Cases

**Development Testing with Unified Cache:**
```elixir
# 1. Enable disk persistence during development
ExLLM.Cache.configure_disk_persistence(true)

# 2. Use normal caching - responses get collected automatically
{:ok, response} = ExLLM.chat(:openai, messages, cache: true)
{:ok, response} = ExLLM.chat(:anthropic, messages, cache: true)

# 3. Use cached responses in tests
ExLLM.ResponseCache.configure_mock_provider(:openai)
# Tests now use real OpenAI response structures
```

**Development Testing with Legacy Cache:**
```elixir
# 1. Collect responses during development
export EX_LLM_CACHE_RESPONSES=true
# Run your app normally - responses get cached

# 2. Use cached responses in tests
ExLLM.ResponseCache.configure_mock_provider(:openai)
# Tests now use real OpenAI response structures
```

**Cost Reduction:**
```elixir
# Unified cache approach - enable persistence temporarily
ExLLM.Cache.configure_disk_persistence(true)

# Cache expensive model responses during development
{:ok, response} = ExLLM.chat(:openai, messages,
  cache: true,
  model: "gpt-4o"  # Expensive model
)
# Response is cached automatically both in memory and disk

# Later testing uses cached response - no API cost
ExLLM.ResponseCache.configure_mock_provider(:openai)
{:ok, same_response} = ExLLM.chat(messages, provider: :mock)

# Disable persistence for production
ExLLM.Cache.configure_disk_persistence(false)
```

**Cross-Provider Testing:**
```elixir
# Test how your app handles different provider response formats
ExLLM.ResponseCache.configure_mock_provider(:openai)
test_openai_format()

ExLLM.ResponseCache.configure_mock_provider(:anthropic)
test_anthropic_format()

ExLLM.ResponseCache.configure_mock_provider(:openrouter)
test_openrouter_format()
```

### Advanced Usage

**Streaming Response Caching:**
```elixir
# Streaming responses are automatically cached
{:ok, stream} = ExLLM.stream_chat(messages, provider: :openai)
chunks = Enum.to_list(stream)

# Later, mock can replay the exact same stream
ExLLM.ResponseCache.configure_mock_provider(:openai)
{:ok, cached_stream} = ExLLM.stream_chat(messages, provider: :mock)
# Returns identical streaming chunks
```

**Interceptor Wrapping:**
```elixir
# Manually wrap API calls for caching
{:ok, response} = ExLLM.CachingInterceptor.with_caching(:openai, fn ->
  ExLLM.Adapters.OpenAI.chat(messages)
end)

# Wrap streaming calls
{:ok, stream} = ExLLM.CachingInterceptor.with_streaming_cache(
  :anthropic, 
  messages, 
  options, 
  fn -> ExLLM.Adapters.Anthropic.stream_chat(messages, options) end
)
```

## Model Discovery

### Finding Models

```elixir
# Get model information
{:ok, info} = ExLLM.get_model_info(:openai, "gpt-4o")
IO.inspect(info)
# => %ExLLM.ModelCapabilities.ModelInfo{
#   id: "gpt-4o",
#   context_window: 128000,
#   max_output_tokens: 16384,
#   capabilities: %{
#     vision: %{supported: true},
#     function_calling: %{supported: true},
#     streaming: %{supported: true},
#     ...
#   }
# }

# Check specific capability
if ExLLM.model_supports?(:openai, "gpt-4o", :vision) do
  # Model supports vision
end
```

### Model Recommendations

```elixir
# Get recommendations based on requirements
recommendations = ExLLM.recommend_models(
  features: [:vision, :function_calling],
  min_context_window: 100_000,
  max_cost_per_1k_tokens: 1.0,
  prefer_local: false,
  limit: 5
)

for {provider, model, info} <- recommendations do
  IO.puts("#{provider}/#{model}")
  IO.puts("  Score: #{info.score}")
  IO.puts("  Context: #{info.context_window}")
  IO.puts("  Cost: $#{info.cost_per_1k}/1k tokens")
end
```

### Finding Models by Feature

```elixir
# Find all models with specific features
models = ExLLM.find_models_with_features([:vision, :streaming])
# => [
#   {:openai, "gpt-4o"},
#   {:anthropic, "claude-3-opus-20240229"},
#   ...
# ]

# Group models by capability
grouped = ExLLM.models_by_capability(:vision)
# => %{
#   supported: [{:openai, "gpt-4o"}, ...],
#   not_supported: [{:openai, "gpt-3.5-turbo"}, ...]
# }
```

### Comparing Models

```elixir
comparison = ExLLM.compare_models([
  {:openai, "gpt-4o"},
  {:anthropic, "claude-3-5-sonnet-20241022"},
  {:gemini, "gemini-1.5-pro"}
])

# See feature support across models
IO.inspect(comparison.features[:vision])
# => [
#   %{model: "gpt-4o", supported: true, details: %{...}},
#   %{model: "claude-3-5-sonnet", supported: true, details: %{...}},
#   %{model: "gemini-1.5-pro", supported: true, details: %{...}}
# ]
```

## Provider Capabilities

### Capability Normalization

ExLLM automatically normalizes different provider terminologies:

```elixir
# These all work and refer to the same capability
ExLLM.provider_supports?(:openai, :function_calling)     # => true
ExLLM.provider_supports?(:anthropic, :tool_use)          # => true
ExLLM.provider_supports?(:openai, :tools)                # => true

# Find providers using any terminology
ExLLM.find_providers_with_features([:tool_use])          # Works!
ExLLM.find_providers_with_features([:function_calling])  # Also works!
```

### Provider Discovery

```elixir
# Get provider capabilities
{:ok, caps} = ExLLM.get_provider_capabilities(:openai)
IO.inspect(caps)
# => %ExLLM.ProviderCapabilities.ProviderInfo{
#   id: :openai,
#   name: "OpenAI",
#   endpoints: [:chat, :embeddings, :images, ...],
#   features: [:streaming, :function_calling, ...],
#   limitations: %{max_file_size: 512MB, ...}
# }

# Find providers by feature
providers = ExLLM.find_providers_with_features([:embeddings, :streaming])
# => [:openai, :gemini, :bedrock, ...]

# Check authentication requirements
if ExLLM.provider_requires_auth?(:openai) do
  # Provider needs API key
end

# Check if provider is local
if ExLLM.is_local_provider?(:ollama) do
  # No API costs
end
```

### Provider Recommendations

```elixir
recommendations = ExLLM.recommend_providers(%{
  required_features: [:vision, :streaming],
  preferred_features: [:embeddings, :function_calling],
  exclude_providers: [:mock],
  prefer_local: false,
  prefer_free: false
})

for %{provider: provider, score: score, matched_features: features} <- recommendations do
  IO.puts("#{provider}: #{Float.round(score, 2)}")
  IO.puts("  Features: #{Enum.join(features, ", ")}")
end
```

### Comparing Providers

```elixir
comparison = ExLLM.compare_providers([:openai, :anthropic, :gemini])

# See all features across providers
IO.puts("All features: #{Enum.join(comparison.features, ", ")}")

# Check specific provider capabilities
openai_features = comparison.comparison.openai.features
# => [:streaming, :function_calling, :embeddings, ...]
```

## Logging

ExLLM provides a unified logging system with security features.

### Basic Logging

```elixir
alias ExLLM.Logger

# Log at different levels
Logger.debug("Starting chat request")
Logger.info("Chat completed in #{duration}ms")
Logger.warn("Rate limit approaching")
Logger.error("API request failed", error: reason)
```

### Structured Logging

```elixir
# Log with metadata
Logger.info("Chat completed",
  provider: :openai,
  model: "gpt-4o",
  tokens: 150,
  duration_ms: 523
)

# Context-aware logging
Logger.with_context(request_id: "abc123") do
  Logger.info("Processing request")
  # All logs in this block include request_id
end
```

### Security Features

```elixir
# API keys are automatically redacted
Logger.info("Using API key", api_key: "sk-1234567890")
# Logs: "Using API key [api_key: REDACTED]"

# Configure content filtering
Application.put_env(:ex_llm, :log_redact_messages, true)
```

### Configuration

```elixir
# In config/config.exs
config :ex_llm,
  log_level: :info,                    # Minimum level to log
  log_redact_keys: true,               # Redact API keys
  log_redact_messages: false,          # Don't log message content
  log_include_metadata: true,          # Include structured metadata
  log_filter_components: [:cache]      # Don't log from cache component
```

See the [Logger User Guide](LOGGER.md) for complete documentation.

## Testing with Mock Adapter

The mock adapter helps you test LLM integrations without making real API calls.

### Basic Mocking

```elixir
# Start the mock adapter
{:ok, _} = ExLLM.Adapters.Mock.start_link()

# Configure mock response
{:ok, response} = ExLLM.chat(:mock, messages,
  mock_response: "This is a mock response"
)

assert response.content == "This is a mock response"
```

### Dynamic Responses

```elixir
# Use a handler function
{:ok, response} = ExLLM.chat(:mock, messages,
  mock_handler: fn messages, _options ->
    last_message = List.last(messages)
    %ExLLM.Types.LLMResponse{
      content: "You said: #{last_message.content}",
      model: "mock-model",
      usage: %{input_tokens: 10, output_tokens: 20}
    }
  end
)
```

### Simulating Errors

```elixir
# Simulate specific errors
{:error, error} = ExLLM.chat(:mock, messages,
  mock_error: %ExLLM.Error{
    type: :rate_limit,
    message: "Rate limit exceeded",
    retry_after: 60
  }
)
```

### Streaming Mocks

```elixir
{:ok, stream} = ExLLM.stream_chat(:mock, messages,
  mock_chunks: [
    %{content: "Hello"},
    %{content: " world"},
    %{content: "!", finish_reason: "stop"}
  ],
  chunk_delay: 100  # Milliseconds between chunks
)

for chunk <- stream do
  IO.write(chunk.content || "")
end
```

### Request Capture

```elixir
# Capture requests for assertions
ExLLM.Adapters.Mock.clear_requests()

{:ok, _} = ExLLM.chat(:mock, messages,
  capture_requests: true,
  mock_response: "OK"
)

requests = ExLLM.Adapters.Mock.get_requests()
assert length(requests) == 1
assert List.first(requests).messages == messages
```

## Advanced Topics

### Custom Adapters

Create your own adapter for unsupported providers:

```elixir
defmodule MyApp.CustomAdapter do
  @behaviour ExLLM.Adapter
  
  @impl true
  def configured?(options) do
    # Check if adapter is properly configured
    config = get_config(options)
    config[:api_key] != nil
  end
  
  @impl true
  def default_model() do
    "custom-model-v1"
  end
  
  @impl true
  def chat(messages, options) do
    # Implement chat logic
    # Return {:ok, %ExLLM.Types.LLMResponse{}} or {:error, reason}
  end
  
  @impl true
  def stream_chat(messages, options) do
    # Return {:ok, stream} where stream yields StreamChunk structs
  end
  
  # Optional callbacks
  @impl true
  def list_models(options) do
    # Return {:ok, [%ExLLM.Types.Model{}]}
  end
  
  @impl true
  def embeddings(inputs, options) do
    # Return {:ok, %ExLLM.Types.EmbeddingResponse{}}
  end
end
```

### Stream Processing

Advanced stream handling:

```elixir
defmodule StreamProcessor do
  def process_with_buffer(provider, messages, opts) do
    {:ok, stream} = ExLLM.stream_chat(provider, messages, opts)
    
    stream
    |> Stream.scan("", fn chunk, buffer ->
      case chunk do
        %{content: nil} -> buffer
        %{content: text} -> buffer <> text
      end
    end)
    |> Stream.each(fn buffer ->
      # Process complete sentences
      if String.ends_with?(buffer, ".") do
        IO.puts("\nComplete: #{buffer}")
      end
    end)
    |> Stream.run()
  end
end
```

### Token Budget Management

Manage token usage across multiple requests:

```elixir
defmodule TokenBudget do
  use GenServer
  
  def init(budget) do
    {:ok, %{budget: budget, used: 0}}
  end
  
  def track_usage(pid, tokens) do
    GenServer.call(pid, {:track, tokens})
  end
  
  def handle_call({:track, tokens}, _from, state) do
    new_used = state.used + tokens
    if new_used <= state.budget do
      {:reply, :ok, %{state | used: new_used}}
    else
      {:reply, {:error, :budget_exceeded}, state}
    end
  end
end

# Use with ExLLM
{:ok, budget} = GenServer.start_link(TokenBudget, 10_000)

{:ok, response} = ExLLM.chat(:openai, messages)
:ok = TokenBudget.track_usage(budget, response.usage.total_tokens)
```

### Multi-Provider Routing

Route requests to different providers based on criteria:

```elixir
defmodule ProviderRouter do
  def route_request(messages, requirements) do
    cond do
      # Use local for development
      Mix.env() == :dev ->
        ExLLM.chat(:ollama, messages)
        
      # Use Groq for speed-critical requests
      requirements[:max_latency_ms] < 1000 ->
        ExLLM.chat(:groq, messages)
        
      # Use OpenAI for complex reasoning
      requirements[:complexity] == :high ->
        ExLLM.chat(:openai, messages, model: "gpt-4o")
        
      # Default to Anthropic
      true ->
        ExLLM.chat(:anthropic, messages)
    end
  end
end
```

### Batch Processing

Process multiple requests efficiently:

```elixir
defmodule BatchProcessor do
  def process_batch(items, opts \\ []) do
    # Use Task.async_stream for parallel processing
    items
    |> Task.async_stream(
      fn item ->
        ExLLM.chat(opts[:provider] || :openai, [
          %{role: "user", content: item}
        ])
      end,
      max_concurrency: opts[:concurrency] || 5,
      timeout: opts[:timeout] || 30_000
    )
    |> Enum.map(fn
      {:ok, {:ok, response}} -> {:ok, response}
      {:ok, {:error, reason}} -> {:error, reason}
      {:exit, reason} -> {:error, {:timeout, reason}}
    end)
  end
end
```

### Custom Configuration Management

Implement advanced configuration strategies:

```elixir
defmodule ConfigManager do
  use GenServer
  
  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts, name: __MODULE__)
  end
  
  def init(_opts) do
    # Load from multiple sources
    config = %{}
    |> load_from_env()
    |> load_from_file()
    |> load_from_vault()
    |> validate_config()
    
    {:ok, config}
  end
  
  def get_config(provider) do
    GenServer.call(__MODULE__, {:get, provider})
  end
  
  defp load_from_vault(config) do
    # Fetch from HashiCorp Vault, AWS Secrets Manager, etc.
    Map.merge(config, fetch_secrets())
  end
end
```

## Best Practices

1. **Always handle errors** - LLM APIs can fail for various reasons
2. **Use streaming for long responses** - Better user experience
3. **Enable caching for repeated queries** - Save costs
4. **Monitor token usage** - Stay within budget
5. **Use appropriate models** - Don't use GPT-4 for simple tasks
6. **Implement fallbacks** - Have backup providers ready
7. **Test with mocks** - Don't make API calls in tests
8. **Use context management** - Handle long conversations properly
9. **Track costs** - Monitor spending across providers
10. **Follow rate limits** - Respect provider limitations

## Troubleshooting

### Common Issues

1. **"API key not found"**
   - Check environment variables
   - Verify configuration provider is started
   - Use `ExLLM.configured?/1` to debug

2. **"Context length exceeded"**
   - Use context management strategies
   - Choose models with larger context windows
   - Truncate conversation history

3. **"Rate limit exceeded"**
   - Enable automatic retry
   - Implement backoff strategies
   - Consider multiple API keys

4. **"Stream interrupted"**
   - Enable stream recovery
   - Implement reconnection logic
   - Check network stability

5. **"Invalid response format"**
   - Check provider documentation
   - Verify model capabilities
   - Use appropriate options

### Debug Mode

Enable debug logging:

```elixir
# In config
config :ex_llm, :log_level, :debug

# Or at runtime
Logger.configure(level: :debug)
```

### Getting Help

- Check the [API documentation](https://hexdocs.pm/ex_llm)
- Review [example applications](../examples/)
- Open an issue on [GitHub](https://github.com/azmaveth/ex_llm)
- Read provider-specific documentation

## Additional Resources

- [Quick Start Guide](QUICKSTART.md) - Get started quickly
- [Provider Capabilities](PROVIDER_CAPABILITIES.md) - Detailed provider information
- [Logger Guide](LOGGER.md) - Logging system documentation
- [API Reference](https://hexdocs.pm/ex_llm) - Complete API documentation