# ExLLM
A unified Elixir client for Large Language Models with integrated cost tracking, providing a consistent interface across multiple LLM providers.
> ⚠️ **Alpha Quality Software**: This library is in early development. APIs may change without notice until version 1.0.0 is released. Use in production at your own risk.
## What's New in v0.4.1
- **Response Caching System** - Cache and replay real provider responses for testing
- **3 New Providers** - Added support for LM Studio, Mistral AI, and Perplexity
- **Enhanced Shared Modules** - Better error handling and response building across all providers
- **Improved Documentation** - Updated guides with all 14 supported providers
## Features
- **Unified API**: Single interface for multiple LLM providers
- **Streaming Support**: Real-time streaming responses with error recovery
- **Cost Tracking**: Automatic cost calculation for all API calls
- **Token Estimation**: Heuristic-based token counting for cost prediction
- **Context Management**: Automatic message truncation to fit model context windows
- **Session Management**: Built-in conversation state tracking and persistence
- **Structured Outputs**: Schema validation and retries via Instructor integration
- **Function Calling**: Unified interface for tool use across providers
- **Model Discovery**: Query and compare model capabilities across providers
- **Capability Normalization**: Automatic normalization of provider-specific feature names
- **Error Recovery**: Automatic retry with exponential backoff and stream resumption
- **Mock Testing**: Built-in mock adapter for testing without API calls
- **Response Caching**: Cache real provider responses for offline testing and cost reduction
- **Type Safety**: Comprehensive typespecs and structured data
- **Configurable**: Flexible configuration system with multiple providers
- **Extensible**: Easy to add new LLM providers via adapter pattern
## Supported Providers
- **Anthropic Claude** - Full support for all Claude models
- claude-opus-4-20250514 (Claude 4 Opus - most capable)
- claude-sonnet-4-20250514 (Claude 4 Sonnet - balanced)
- claude-3-7-sonnet-20250219 (Claude 3.7 Sonnet)
- claude-3-5-sonnet-20241022 (Claude 3.5 Sonnet)
- claude-3-5-haiku-20241022 (Claude 3.5 Haiku - fastest)
- claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
- **OpenAI** - Latest GPT models including reasoning models
- gpt-4.1 series (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano - default)
- o1 reasoning models (o1-pro, o1, o1-mini, o1-preview)
- gpt-4o series (gpt-4o, gpt-4o-mini, gpt-4o-latest)
- gpt-4-turbo series
- gpt-3.5-turbo models
- Specialized models for audio, search, and extended output
- **Ollama** - Local model runner
- Any model available in your Ollama installation
- Automatic model discovery
- No API costs
- **AWS Bedrock** - Multi-provider access with comprehensive model support
- **Anthropic Claude**: All Claude 4, 3.7, 3.5, 3, and 2.x models
- **Amazon Nova**: Micro, Lite (default), Pro, Premier
- **Amazon Titan**: Lite, Express text models
- **Meta Llama**: Llama 4 (Maverick, Scout), Llama 3.3, 3.2, and 2 series
- **Cohere**: Command, Command Light, Command R, Command R+
- **AI21 Labs**: Jamba 1.5 (Large, Mini), Jamba Instruct, Jurassic 2
- **Mistral**: Pixtral Large 2025-02, Mistral 7B, Mixtral 8x7B
- **Writer**: Palmyra X4, Palmyra X5
- **DeepSeek**: DeepSeek R1
- **Google Gemini** - Gemini models with multimodal support
- gemini-2.5-pro series (experimental advanced models)
- gemini-2.0-flash series (fast multimodal)
- gemini-1.5-pro and gemini-1.5-flash (1M+ context)
- gemini-pro and gemini-pro-vision
- Specialized models for image generation and TTS
- **OpenRouter** - Access to 300+ models from multiple providers
- Claude, GPT-4, Llama, PaLM, and many more
- Unified API for different model architectures
- Cost-effective access to premium models
- Automatic model discovery
- **Groq** - Fast inference platform
- Llama 4 Scout (17B), Llama 3.3 (70B), Llama 3.1 and 3 series
- DeepSeek R1 Distill (default - 70B reasoning model)
- QwQ-32B (reasoning model)
- Mixtral 8x7B, Gemma series
- Mistral Saba and specialized models
- Optimized for ultra-low latency inference
- **X.AI** - Grok models with advanced capabilities
- grok-beta (131K context)
- grok-2 and grok-2-vision models
- grok-3 models with reasoning support
- Web search and tool use capabilities
- Vision support on select models
- **LM Studio** - Local model server with OpenAI-compatible API
- Any model loaded in LM Studio
- Automatic model discovery
- No API costs
- OpenAI-compatible endpoints
- **Mistral AI** - Mistral platform models
- mistral-large-latest (flagship model)
- pixtral-large-latest (128K context)
- ministral-3b and ministral-8b (edge models)
- codestral-latest (code generation)
- mistral-small, mistral-embed
- **Perplexity** - Search-enhanced language models
- sonar-reasoning (latest reasoning model)
- sonar-pro and sonar (search-enhanced)
- llama-3.1-sonar series
- Various open-source models
- **Bumblebee** - Local model inference
- microsoft/phi-4 (14B params, default)
- meta-llama/Llama-3.3-70B
- meta-llama/Llama-3.2-3B
- meta-llama/Llama-3.1-8B
- mistralai/Mistral-Small-24B
- google/gemma-3-4b
- google/gemma-3-12b
- google/gemma-3-27b
- Qwen/Qwen3-1.7B
- Qwen/Qwen3-8B
- Qwen/Qwen3-14B
- **Mock Adapter** - For testing and development
- Configurable responses
- Error simulation
- Request capture
- Response caching integration
- No API calls needed
## Installation
Add `ex_llm` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:ex_llm, "~> 0.4.1"},
# Included dependencies (no need to add these manually):
# - {:instructor, "~> 0.1.0"} - For structured outputs
# - {:bumblebee, "~> 0.5"} - For local model support
# - {:nx, "~> 0.7"} - For numerical computing
# Optional hardware acceleration backends (choose one):
{:exla, "~> 0.7", optional: true},
# Optional: For Apple Silicon Metal acceleration
# (not included in Hex package, add manually if needed)
{:emlx, github: "elixir-nx/emlx", branch: "main", optional: true}
]
end
```
## Quick Start
📚 **[Quick Start Guide](docs/QUICKSTART.md)** - Get up and running in 5 minutes
📖 **[User Guide](docs/USER_GUIDE.md)** - Comprehensive documentation of all features
### Configuration
Configure your LLM providers in `config/config.exs`:
```elixir
config :ex_llm,
anthropic: [
api_key: System.get_env("ANTHROPIC_API_KEY"),
base_url: "https://api.anthropic.com"
],
openai: [
api_key: System.get_env("OPENAI_API_KEY"),
base_url: "https://api.openai.com"
],
xai: [
api_key: System.get_env("XAI_API_KEY"),
base_url: "https://api.x.ai"
],
groq: [
api_key: System.get_env("GROQ_API_KEY"),
base_url: "https://api.groq.com"
],
mistral: [
api_key: System.get_env("MISTRAL_API_KEY"),
base_url: "https://api.mistral.ai"
],
perplexity: [
api_key: System.get_env("PERPLEXITY_API_KEY"),
base_url: "https://api.perplexity.ai"
],
ollama: [
base_url: "http://localhost:11434"
],
lmstudio: [
base_url: "http://localhost:1234"
],
bedrock: [
# AWS credentials (optional - uses credential chain by default)
access_key_id: System.get_env("AWS_ACCESS_KEY_ID"),
secret_access_key: System.get_env("AWS_SECRET_ACCESS_KEY"),
region: System.get_env("AWS_REGION") || "us-east-1",
model: "nova-lite" # Default model (cost-effective)
],
gemini: [
api_key: System.get_env("GEMINI_API_KEY"),
base_url: "https://generativelanguage.googleapis.com"
],
openrouter: [
api_key: System.get_env("OPENROUTER_API_KEY"),
base_url: "https://openrouter.ai/api/v1"
]
```
### Basic Usage
```elixir
# Simple chat completion with automatic cost tracking
messages = [
%{role: "user", content: "Hello, how are you?"}
]
{:ok, response} = ExLLM.chat(:anthropic, messages)
IO.puts(response.content)
IO.puts("Cost: #{ExLLM.format_cost(response.cost.total_cost)}")
# Using Bumblebee for local models (no API costs!)
{:ok, response} = ExLLM.chat(:bumblebee, messages, model: "microsoft/phi-4")
IO.puts(response.content)
# Using LM Studio (local server)
{:ok, response} = ExLLM.chat(:lmstudio, messages)
IO.puts(response.content)
# Using Groq for ultra-fast inference
{:ok, response} = ExLLM.chat(:groq, messages, model: "deepseek-r1-distill-llama-70b")
IO.puts(response.content)
# Using Mistral AI
{:ok, response} = ExLLM.chat(:mistral, messages, model: "mistral-large-latest")
IO.puts(response.content)
# Using Perplexity for search-enhanced responses
{:ok, response} = ExLLM.chat(:perplexity, messages, model: "sonar-reasoning")
IO.puts(response.content)
# Using OpenRouter for access to many models
{:ok, response} = ExLLM.chat(:openrouter, messages, model: "openai/gpt-4o-mini")
IO.puts(response.content)
# Streaming chat with error recovery
ExLLM.stream_chat(:anthropic, messages,
stream_recovery: true,
fn chunk ->
IO.write(chunk.content)
end
)
# Using mock adapter for testing
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "This is a test response"
)
# Estimate tokens before making a request
tokens = ExLLM.estimate_tokens(messages)
IO.puts("Estimated tokens: #{tokens}")
# Calculate cost for specific usage
usage = %{input_tokens: 1000, output_tokens: 500}
cost = ExLLM.calculate_cost(:openai, "gpt-4", usage)
IO.puts("Total cost: #{ExLLM.format_cost(cost.total_cost)}")
```
### Advanced Usage
```elixir
# With custom options
options = [
model: "claude-3-5-sonnet-20241022",
max_tokens: 1000,
temperature: 0.7,
retry_count: 3, # Automatic retry with exponential backoff
retry_delay: 1000 # Initial retry delay in ms
]
{:ok, response} = ExLLM.chat(:anthropic, messages, options)
# Function calling
functions = [
%{
name: "get_weather",
description: "Get the current weather for a location",
parameters: %{
type: "object",
properties: %{
location: %{type: "string", description: "City, State or Country"},
unit: %{type: "string", enum: ["celsius", "fahrenheit"], description: "Temperature unit"}
},
required: ["location"]
}
}
]
{:ok, response} = ExLLM.chat(:anthropic,
[%{role: "user", content: "What's the weather in Paris, France?"}],
functions: functions
)
# Parse and execute function calls
case ExLLM.parse_function_calls(response) do
{:ok, [call | _]} ->
# Execute the function
result = get_weather(call.arguments.location, call.arguments[:unit] || "celsius")
# Format the result for the conversation
function_message = ExLLM.format_function_result(call.name, result)
:none ->
# No function calls in response
end
# Model discovery and recommendations
{:ok, models} = ExLLM.list_models(:anthropic)
Enum.each(models, &IO.puts(&1.name))
# Find models with specific capabilities
vision_models = ExLLM.find_models_with_features([:vision])
function_models = ExLLM.find_models_with_features([:function_calling, :streaming])
# Get model recommendations
recommended = ExLLM.recommend_models(%{
provider: :anthropic,
min_context_window: 100_000,
required_features: [:function_calling],
preferred_features: [:vision],
max_cost_per_million_tokens: 15.0
})
# Compare models
comparison = ExLLM.compare_models([
{:anthropic, "claude-3-5-sonnet-20241022"},
{:openai, "gpt-4-turbo"},
{:gemini, "gemini-pro"}
])
# Provider capabilities - find providers by features
{:ok, caps} = ExLLM.get_provider_capabilities(:openai)
IO.puts("Endpoints: #{Enum.join(caps.endpoints, ", ")}")
# => "Endpoints: chat, embeddings, images, audio, completions, fine_tuning, files"
# Find providers with specific features
providers = ExLLM.find_providers_with_features([:embeddings, :streaming])
# => [:openai, :ollama]
# Get provider recommendations
recommendations = ExLLM.recommend_providers(%{
required_features: [:vision, :streaming],
preferred_features: [:audio_input, :function_calling],
prefer_local: false
})
# => [
# %{provider: :openai, score: 0.95, matched_features: [...], missing_features: []},
# %{provider: :anthropic, score: 0.80, matched_features: [...], missing_features: [:audio_input]}
# ]
# Context management - automatically truncate long conversations
long_conversation = [
%{role: "system", content: "You are a helpful assistant."},
# ... many messages ...
%{role: "user", content: "What's the weather?"}
]
# Automatically truncates to fit model's context window
{:ok, response} = ExLLM.chat(:anthropic, long_conversation,
max_tokens: 4000, # Max tokens for context
strategy: :smart # Preserve system messages and recent context
)
```
### Session Management
```elixir
# Create a new conversation session
session = ExLLM.new_session(:anthropic, name: "Customer Support")
# Chat with automatic session tracking
{:ok, {response, session}} = ExLLM.chat_with_session(session, "Hello!")
IO.puts(response.content)
# Continue the conversation
{:ok, {response, session}} = ExLLM.chat_with_session(session, "What can you help me with?")
# Session automatically tracks:
# - Message history
# - Token usage
# - Conversation context
# Review session details
messages = ExLLM.get_session_messages(session)
total_tokens = ExLLM.session_token_usage(session)
IO.puts("Total tokens used: #{total_tokens}")
# Save session for later
{:ok, json} = ExLLM.save_session(session)
File.write!("session.json", json)
# Load session later
{:ok, session} = ExLLM.load_session(File.read!("session.json"))
```
## API Reference
### Core Functions
- `chat/3` - Send messages and get a complete response
- `stream_chat/3` - Send messages and stream the response
- `configured?/2` - Check if a provider is properly configured
- `list_models/2` - Get available models for a provider
- `prepare_messages/2` - Prepare messages for context window
- `validate_context/2` - Validate messages fit within context window
- `context_window_size/2` - Get context window size for a model
- `context_stats/1` - Get statistics about message context usage
### Session Functions
- `new_session/2` - Create a new conversation session
- `chat_with_session/3` - Chat with automatic session tracking
- `add_session_message/4` - Add a message to a session
- `get_session_messages/2` - Retrieve messages from a session
- `session_token_usage/1` - Get total token usage for a session
- `clear_session/1` - Clear messages while preserving metadata
- `save_session/1` - Serialize session to JSON
- `load_session/1` - Load session from JSON
### Function Calling
- `parse_function_calls/2` - Parse function calls from LLM response
- `execute_function/2` - Execute a function call with validation
- `format_function_result/2` - Format function result for conversation
### Model Capabilities
- `get_model_info/2` - Get complete capability information for a model
- `model_supports?/3` - Check if a model supports a specific feature
- `find_models_with_features/1` - Find models that support specific features
- `compare_models/1` - Compare capabilities across multiple models
- `recommend_models/1` - Get model recommendations based on requirements
- `models_by_capability/1` - Get models grouped by capability support
- `list_model_features/0` - List all trackable model features
### Provider Capabilities
- `get_provider_capabilities/1` - Get API-level capabilities for a provider
- `provider_supports?/2` - Check if a provider supports a feature/endpoint
- `find_providers_with_features/1` - Find providers that support specific features
- `compare_providers/1` - Compare capabilities across multiple providers
- `recommend_providers/1` - Get provider recommendations based on requirements
- `list_providers/0` - List all available providers
- `is_local_provider?/1` - Check if a provider runs locally
- `provider_requires_auth?/1` - Check if a provider requires authentication
### Capability Normalization
ExLLM automatically normalizes different capability names used by various providers. This means you can use provider-specific terminology and ExLLM will understand it:
```elixir
# These all refer to the same capability (function calling)
ExLLM.provider_supports?(:openai, :function_calling) # => true
ExLLM.provider_supports?(:anthropic, :tool_use) # => true
ExLLM.provider_supports?(:openai, :tools) # => true
# Find providers using any terminology
ExLLM.find_providers_with_features([:tool_use]) # Works!
ExLLM.find_providers_with_features([:function_calling]) # Also works!
```
Common normalizations:
- Function calling: `function_calling`, `tool_use`, `tools`, `functions`
- Image generation: `image_generation`, `images`, `dalle`, `text_to_image`
- Speech synthesis: `speech_synthesis`, `tts`, `text_to_speech`
- Embeddings: `embeddings`, `embed`, `embedding`, `text_embedding`
- Vision: `vision`, `image_understanding`, `visual_understanding`, `multimodal`
### Error Recovery
- `resume_stream/2` - Resume a previously interrupted stream
- `list_recoverable_streams/0` - List all recoverable streams
### Data Structures
#### LLMResponse
```elixir
%ExLLM.Types.LLMResponse{
content: "Hello! I'm doing well, thank you for asking.",
usage: %{input_tokens: 12, output_tokens: 15},
model: "claude-3-5-sonnet-20241022",
finish_reason: "end_turn",
cost: %{
total_cost: 0.000261,
input_cost: 0.000036,
output_cost: 0.000225,
currency: "USD"
}
}
```
#### StreamChunk
```elixir
%ExLLM.Types.StreamChunk{
content: "Hello",
delta: true,
finish_reason: nil
}
```
#### Model
```elixir
%ExLLM.Types.Model{
name: "claude-3-5-sonnet-20241022",
provider: :anthropic,
context_length: 200000,
supports_streaming: true
}
```
## Model Configuration
ExLLM uses external YAML configuration files for model metadata, pricing, and capabilities. This allows easy updates without code changes:
### External Configuration Structure
```yaml
# config/models/anthropic.yml
provider: anthropic
default_model: "claude-sonnet-4-20250514"
models:
claude-3-5-sonnet-20241022:
context_window: 200000
pricing:
input: 3.00 # per 1M tokens
output: 15.00
capabilities:
- streaming
- function_calling
- vision
```
### Configuration Management
```elixir
# Get model pricing
pricing = ExLLM.ModelConfig.get_pricing(:anthropic, "claude-3-5-sonnet-20241022")
# Get context window
context = ExLLM.ModelConfig.get_context_window(:openai, "gpt-4o")
# Get default model for provider
default = ExLLM.ModelConfig.get_default_model(:openrouter)
# Configuration is cached for performance
# Updates require restart or cache refresh
```
## Cost Tracking
ExLLM automatically tracks costs for all API calls using the external pricing configuration:
### Automatic Cost Calculation
```elixir
{:ok, response} = ExLLM.chat(:anthropic, messages)
# Access cost information
if response.cost do
IO.puts("Input tokens: #{response.cost.input_tokens}")
IO.puts("Output tokens: #{response.cost.output_tokens}")
IO.puts("Total cost: #{ExLLM.format_cost(response.cost.total_cost)}")
end
```
### Token Estimation
```elixir
# Estimate tokens before making a request
messages = [
%{role: "system", content: "You are a helpful assistant."},
%{role: "user", content: "Explain quantum computing in simple terms."}
]
estimated_tokens = ExLLM.estimate_tokens(messages)
# Use this to predict costs before making the actual API call
```
### Cost Comparison
```elixir
# Compare costs across different providers
usage = %{input_tokens: 1000, output_tokens: 2000}
providers = [
{:openai, "gpt-4"},
{:openai, "gpt-3.5-turbo"},
{:anthropic, "claude-3-5-sonnet-20241022"},
{:anthropic, "claude-3-haiku-20240307"}
]
Enum.each(providers, fn {provider, model} ->
cost = ExLLM.calculate_cost(provider, model, usage)
unless cost[:error] do
IO.puts("#{provider}/#{model}: #{ExLLM.format_cost(cost.total_cost)}")
end
end)
```
### Supported Pricing
ExLLM includes pricing data (as of June 2025) in external YAML files for all supported providers:
- **Anthropic**: Claude 3 series (Opus, Sonnet, Haiku), Claude 3.5, Claude 4
- **OpenAI**: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, GPT-4o series
- **OpenRouter**: 300+ models with dynamic pricing
- **Google Gemini**: Pro, Ultra, Nano
- **AWS Bedrock**: Various models including Claude, Titan, Llama 2
- **Ollama**: Local models (free - $0.00)
- **Bumblebee**: Free ($0.00) - no API costs
Pricing data is stored in `config/models/*.yml` files and can be updated independently of code changes.
## Context Management
ExLLM automatically manages context windows to ensure your messages fit within model limits:
### Automatic Context Truncation
```elixir
# Long conversation that might exceed context window
messages = [
%{role: "system", content: "You are a helpful assistant."},
# ... hundreds of messages ...
%{role: "user", content: "What's my current task?"}
]
# ExLLM automatically truncates to fit the model's context window
{:ok, response} = ExLLM.chat(:anthropic, messages)
```
### Context Window Validation
```elixir
# Check if messages fit within context window
case ExLLM.validate_context(messages, model: "gpt-3.5-turbo") do
{:ok, token_count} ->
IO.puts("Messages use #{token_count} tokens")
{:error, {:context_too_large, %{tokens: tokens, max_tokens: max}}} ->
IO.puts("Messages too large: #{tokens} tokens (max: #{max})")
end
```
### Context Strategies
```elixir
# Sliding window (default) - keeps most recent messages
{:ok, response} = ExLLM.chat(:anthropic, messages,
max_tokens: 4000,
strategy: :sliding_window
)
# Smart strategy - preserves system messages and recent context
{:ok, response} = ExLLM.chat(:anthropic, messages,
max_tokens: 4000,
strategy: :smart,
preserve_messages: 10 # Always keep last 10 messages
)
```
### Context Statistics
```elixir
# Get detailed statistics about your messages
stats = ExLLM.context_stats(messages)
IO.inspect(stats)
# %{
# message_count: 150,
# total_tokens: 45000,
# by_role: %{"system" => 1, "user" => 75, "assistant" => 74},
# avg_tokens_per_message: 300
# }
# Check context window sizes
IO.puts(ExLLM.context_window_size(:anthropic, "claude-3-5-sonnet-20241022"))
# => 200000
```
## Session Management
ExLLM includes built-in session management for maintaining conversation state:
### Creating and Using Sessions
```elixir
# Create a new session
session = ExLLM.new_session(:anthropic, name: "My Chat")
# Chat with automatic session tracking
{:ok, {response, updated_session}} = ExLLM.chat_with_session(session, "Hello!")
# Continue the conversation
{:ok, {response2, session2}} = ExLLM.chat_with_session(updated_session, "What's 2+2?")
# Access session messages
messages = ExLLM.get_session_messages(session2)
# => [%{role: "user", content: "Hello!"}, %{role: "assistant", content: "..."}, ...]
```
### Session Persistence
```elixir
# Save session to disk
{:ok, path} = ExLLM.save_session(session, "/path/to/sessions")
# Load session from disk
{:ok, loaded_session} = ExLLM.load_session("/path/to/sessions/session_id.json")
# Export session as markdown
{:ok, markdown} = ExLLM.export_session_markdown(session)
File.write!("conversation.md", markdown)
```
### Session Information
```elixir
# Get session metadata
info = ExLLM.session_info(session)
# => %{
# id: "123...",
# name: "My Chat",
# created_at: ~U[2025-01-24 10:00:00Z],
# message_count: 10,
# total_tokens: 1500
# }
# Get token usage for session
tokens = ExLLM.session_token_usage(session)
# => 1500
# Clear session messages
clean_session = ExLLM.clear_session(session)
```
## Structured Outputs
ExLLM integrates with [instructor_ex](https://github.com/thmsmlr/instructor_ex) to provide structured output validation. This allows you to define expected response structures using Ecto schemas and automatically validate LLM responses.
Instructor is included as a dependency of ExLLM, so no additional installation is needed.
### Basic Usage
```elixir
# Define your schema
defmodule EmailClassification do
use Ecto.Schema
use Instructor.Validator
@llm_doc "Classification of an email as spam or not spam"
@primary_key false
embedded_schema do
field :classification, Ecto.Enum, values: [:spam, :not_spam]
field :confidence, :float
field :reason, :string
end
@impl true
def validate_changeset(changeset) do
changeset
|> Ecto.Changeset.validate_required([:classification, :confidence, :reason])
|> Ecto.Changeset.validate_number(:confidence,
greater_than_or_equal_to: 0.0,
less_than_or_equal_to: 1.0
)
end
end
# Use with ExLLM
messages = [%{role: "user", content: "Is this spam? 'You won a million dollars!'"}]
{:ok, result} = ExLLM.chat(:anthropic, messages,
response_model: EmailClassification,
max_retries: 3 # Automatically retry on validation errors
)
IO.inspect(result)
# %EmailClassification{
# classification: :spam,
# confidence: 0.95,
# reason: "Classic lottery scam pattern"
# }
```
### With Simple Type Specifications
```elixir
# Define expected structure without Ecto
response_model = %{
name: :string,
age: :integer,
email: :string,
tags: {:array, :string}
}
messages = [%{role: "user", content: "Extract: John Doe, 30 years old, john@example.com, likes elixir and coding"}]
{:ok, result} = ExLLM.chat(:anthropic, messages,
response_model: response_model
)
IO.inspect(result)
# %{
# name: "John Doe",
# age: 30,
# email: "john@example.com",
# tags: ["elixir", "coding"]
# }
```
### Advanced Example
```elixir
defmodule UserProfile do
use Ecto.Schema
use Instructor.Validator
@llm_doc """
User profile extraction from text.
Extract all available information about the user.
"""
embedded_schema do
field :name, :string
field :email, :string
field :age, :integer
field :location, :string
embeds_many :interests, Interest do
field :name, :string
field :level, Ecto.Enum, values: [:beginner, :intermediate, :expert]
end
end
@impl true
def validate_changeset(changeset) do
changeset
|> Ecto.Changeset.validate_required([:name])
|> Ecto.Changeset.validate_format(:email, ~r/@/)
|> Ecto.Changeset.validate_number(:age, greater_than: 0, less_than: 150)
end
end
# Complex extraction with nested structures
text = """
Hi, I'm Jane Smith, a 28-year-old software engineer from Seattle.
You can reach me at jane.smith@tech.com. I'm an expert in Elixir,
intermediate in Python, and just starting to learn Rust.
"""
{:ok, profile} = ExLLM.chat(:anthropic,
[%{role: "user", content: "Extract user profile: #{text}"}],
response_model: UserProfile,
max_retries: 3
)
```
### Using the Instructor Module Directly
```elixir
# Direct usage of ExLLM.Instructor
{:ok, result} = ExLLM.Instructor.chat(:anthropic, messages,
response_model: EmailClassification,
max_retries: 3,
temperature: 0.1 # Lower temperature for more consistent structure
)
# Parse an existing response
{:ok, response} = ExLLM.chat(:anthropic, messages)
{:ok, structured} = ExLLM.Instructor.parse_response(response, UserProfile)
# Check if instructor is available
if ExLLM.Instructor.available?() do
# Use structured outputs
else
# Fall back to regular parsing
end
```
### Supported Providers
Structured outputs work with providers that have instructor adapters:
- `:anthropic` - Anthropic Claude
- `:openai` - OpenAI GPT models
- `:ollama` - Local Ollama models
- `:gemini` - Google Gemini
- `:bedrock` - AWS Bedrock models
- `:bumblebee` - Local Bumblebee models
### Error Handling
```elixir
case ExLLM.chat(:anthropic, messages, response_model: UserProfile) do
{:ok, profile} ->
# Successfully validated structure
IO.inspect(profile)
{:error, {:validation_failed, errors}} ->
# Validation failed after retries
IO.inspect(errors)
{:error, reason} ->
# Other error
IO.inspect(reason)
end
```
## Configuration
ExLLM supports multiple configuration providers:
### Environment Variables (Default)
```elixir
# Uses ExLLM.ConfigProvider.Default
# Reads from application config and environment variables
```
### Static Configuration
```elixir
config = %{
anthropic: [
api_key: "your-api-key",
base_url: "https://api.anthropic.com"
]
}
ExLLM.set_config_provider({ExLLM.ConfigProvider.Static, config})
```
### Logging
ExLLM provides a unified logging system with fine-grained control over what gets logged and how sensitive data is handled.
📖 **[Read the full Logger User Guide](docs/LOGGER.md)** for detailed documentation.
```elixir
# Quick example
alias ExLLM.Logger
Logger.info("Starting chat completion")
Logger.with_context(provider: :openai, operation: :chat) do
Logger.info("Sending request")
# ... make API call ...
Logger.info("Request completed", tokens: 150, duration_ms: 230)
end
```
Configure logging in your `config/config.exs`:
```elixir
config :ex_llm,
log_level: :info,
log_components: %{
requests: true,
responses: true,
streaming: false, # Can be noisy
retries: true,
cache: false,
models: true
},
log_redaction: %{
api_keys: true, # Always recommended
content: false # Set true in production
}
```
### Custom Configuration Provider
```elixir
defmodule MyConfigProvider do
@behaviour ExLLM.ConfigProvider
@impl true
def get_config(provider, key) do
# Your custom logic here
end
@impl true
def has_config?(provider) do
# Your custom logic here
end
end
ExLLM.set_config_provider(MyConfigProvider)
```
## Error Handling
ExLLM uses consistent error patterns:
```elixir
case ExLLM.chat(:anthropic, messages) do
{:ok, response} ->
# Success
IO.puts(response.content)
{:error, {:config_error, reason}} ->
# Configuration issue
IO.puts("Config error: #{reason}")
{:error, {:api_error, %{status: status, body: body}}} ->
# API error
IO.puts("API error #{status}: #{body}")
{:error, {:network_error, reason}} ->
# Network issue
IO.puts("Network error: #{reason}")
{:error, {:parse_error, reason}} ->
# Response parsing issue
IO.puts("Parse error: #{reason}")
end
```
## Error Recovery and Retries
ExLLM includes automatic error recovery and retry mechanisms:
### Automatic Retries
```elixir
# Configure retry behavior
options = [
retry_count: 3, # Number of retry attempts
retry_delay: 1000, # Initial delay in milliseconds
retry_backoff: :exponential, # Backoff strategy
retry_jitter: true # Add jitter to prevent thundering herd
]
{:ok, response} = ExLLM.chat(:anthropic, messages, options)
# Provider-specific retry policies
ExLLM.Retry.with_retry(fn ->
ExLLM.chat(:anthropic, messages)
end,
max_attempts: 5,
initial_delay: 500,
max_delay: 30_000,
should_retry: fn error ->
# Custom retry logic
case error do
{:api_error, %{status: 429}} -> true # Rate limit
{:api_error, %{status: 503}} -> true # Service unavailable
{:network_error, _} -> true # Network issues
_ -> false
end
end
)
```
### Stream Recovery
```elixir
# Enable automatic stream recovery
{:ok, stream_id} = ExLLM.stream_chat(:anthropic, messages,
stream_recovery: true,
recovery_strategy: :paragraph, # :exact, :paragraph, or :summarize
fn chunk ->
IO.write(chunk.content)
end
)
# If stream is interrupted, resume from where it left off
case ExLLM.resume_stream(stream_id) do
{:ok, resumed_stream} ->
for chunk <- resumed_stream do
IO.write(chunk.content)
end
{:error, :not_found} ->
# Stream not recoverable
end
# List recoverable streams
recoverable = ExLLM.list_recoverable_streams()
```
## Mock Adapter for Testing
The mock adapter allows you to test your LLM interactions without making real API calls:
### Basic Mock Usage
```elixir
# Configure static mock response
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "This is a mock response"
)
# Configure mock with usage data
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: %{
content: "Mock response with usage",
usage: %{input_tokens: 10, output_tokens: 20},
model: "mock-model"
}
)
# Mock streaming responses
ExLLM.stream_chat(:mock, messages,
mock_chunks: ["Hello", " from", " mock", " adapter!"],
chunk_delay: 100, # Delay between chunks in ms
fn chunk ->
IO.write(chunk.content)
end
)
```
### Advanced Mock Configuration
```elixir
# Dynamic mock responses based on input
mock_handler = fn messages ->
last_message = List.last(messages)
cond do
String.contains?(last_message.content, "weather") ->
"It's sunny and 72°F"
String.contains?(last_message.content, "hello") ->
"Hello! How can I help you?"
true ->
"I don't understand"
end
end
{:ok, response} = ExLLM.chat(:mock, messages,
mock_handler: mock_handler
)
# Simulate errors
{:error, {:api_error, %{status: 429, body: "Rate limit exceeded"}}} =
ExLLM.chat(:mock, messages,
mock_error: {:api_error, %{status: 429, body: "Rate limit exceeded"}}
)
# Capture requests for assertions
{:ok, response} = ExLLM.chat(:mock, messages,
capture_requests: true,
mock_response: "Test response"
)
# Access captured requests
captured = ExLLM.Adapters.Mock.get_captured_requests()
assert length(captured) == 1
assert List.first(captured).messages == messages
```
### Testing with Mock Adapter
```elixir
defmodule MyApp.LLMClientTest do
use ExUnit.Case
setup do
# Clear any previous captures
ExLLM.Adapters.Mock.clear_captured_requests()
:ok
end
test "handles weather queries" do
messages = [%{role: "user", content: "What's the weather?"}]
{:ok, response} = ExLLM.chat(:mock, messages,
mock_response: "It's sunny today!",
capture_requests: true
)
assert response.content == "It's sunny today!"
# Verify the request
[request] = ExLLM.Adapters.Mock.get_captured_requests()
assert request.provider == :mock
assert request.messages == messages
end
test "simulates API errors" do
messages = [%{role: "user", content: "Hello"}]
{:error, error} = ExLLM.chat(:mock, messages,
mock_error: {:network_error, :timeout}
)
assert error == {:network_error, :timeout}
end
end
```
## Local Model Support
ExLLM supports running models locally using Bumblebee and EXLA/EMLX backends. This enables on-device inference without API calls or costs.
### Setup
1. ExLLM includes Bumblebee and Nx dependencies. For hardware acceleration, add one of these optional backends to your `mix.exs`:
```elixir
def deps do
[
{:ex_llm, "~> 0.4.1"},
# For CUDA/ROCm GPUs:
{:exla, "~> 0.7"}
# OR for Apple Silicon Metal acceleration:
# {:emlx, github: "elixir-nx/emlx", branch: "main"}
]
end
```
2. Configure EXLA backend (optional - auto-detected by default):
```elixir
# For CUDA GPUs
config :nx, :default_backend, {EXLA.Backend, client: :cuda}
# For Apple Silicon
config :nx, :default_backend, EMLX.Backend
```
### Available Models
- **microsoft/phi-4** - Phi-4 (14B parameters) - Default
- **meta-llama/Llama-3.3-70B** - Llama 3.3 (70B, performance of 405B)
- **meta-llama/Llama-3.2-3B** - Llama 3.2 (3B, efficient for edge)
- **meta-llama/Llama-3.1-8B** - Llama 3.1 (8B)
- **mistralai/Mistral-Small-24B** - Mistral Small (24B)
- **google/gemma-3-4b** - Gemma 3 (4B, multimodal)
- **google/gemma-3-12b** - Gemma 3 (12B, multimodal)
- **google/gemma-3-27b** - Gemma 3 (27B, multimodal)
- **Qwen/Qwen3-1.7B** - Qwen3 (1.7B, multilingual)
- **Qwen/Qwen3-8B** - Qwen3 (8B)
- **Qwen/Qwen3-14B** - Qwen3 (14B)
### Usage
```elixir
# Start the model loader (happens automatically on first use)
{:ok, _} = ExLLM.Local.ModelLoader.start_link()
# Use a local model
messages = [
%{role: "user", content: "Explain quantum computing in simple terms"}
]
{:ok, response} = ExLLM.chat(:bumblebee, messages, model: "microsoft/phi-4")
IO.puts(response.content)
# Stream responses
{:ok, stream} = ExLLM.stream_chat(:bumblebee, messages)
for chunk <- stream do
IO.write(chunk.content)
end
# List available models
{:ok, models} = ExLLM.list_models(:bumblebee)
Enum.each(models, fn model ->
IO.puts("#{model.name} - Context: #{model.context_window} tokens")
end)
# Check acceleration info
info = ExLLM.Local.EXLAConfig.acceleration_info()
IO.puts("Running on: #{info.name}")
```
### Hardware Acceleration
ExLLM automatically detects and uses available hardware acceleration:
- **Apple Silicon** - Uses Metal via EMLX
- **NVIDIA GPUs** - Uses CUDA via EXLA
- **AMD GPUs** - Uses ROCm via EXLA
- **CPUs** - Optimized multi-threaded inference
### Performance Tips
1. **First Load**: Models are downloaded from HuggingFace on first use and cached locally
2. **Memory**: Ensure you have enough RAM/VRAM for your chosen model
3. **Batch Size**: Automatically optimized based on available memory
4. **Mixed Precision**: Enabled by default for better performance
### Model Loading
```elixir
# Pre-load a model
{:ok, _} = ExLLM.Local.ModelLoader.load_model("Qwen/Qwen3-0.6B")
# Load from local path
{:ok, _} = ExLLM.Local.ModelLoader.load_model("/path/to/model")
# Unload to free memory
:ok = ExLLM.Local.ModelLoader.unload_model("Qwen/Qwen3-0.6B")
# List loaded models
loaded = ExLLM.Local.ModelLoader.list_loaded_models()
```
## Adding New Providers
To add a new LLM provider, implement the `ExLLM.Adapter` behaviour:
```elixir
defmodule ExLLM.Adapters.MyProvider do
@behaviour ExLLM.Adapter
@impl true
def chat(messages, options) do
# Implement chat completion
end
@impl true
def stream_chat(messages, options, callback) do
# Implement streaming chat
end
@impl true
def configured?() do
# Check if provider is configured
end
@impl true
def list_models() do
# Return available models
end
end
```
Then register it in the main ExLLM module.
## Requirements
- Elixir ~> 1.14
- Erlang/OTP ~> 25.0
- For local models (optional):
- Bumblebee ~> 0.5
- Nx ~> 0.7
- EXLA ~> 0.7 (for GPU acceleration)
- EMLX ~> 0.1 (for Apple Silicon)
## Development
### Setup
```bash
# Clone the repository
git clone https://github.com/azmaveth/ex_llm.git
cd ex_llm
# Install dependencies
mix deps.get
mix deps.compile
# Run tests
mix test
# Run quality checks
mix format --check-formatted
mix credo
mix dialyzer
```
### Testing
```bash
# Run all tests
mix test
# Run specific test files
mix test test/ex_llm_test.exs
# Run only integration tests
mix test test/*_integration_test.exs
# Run tests with coverage
mix test --cover
```
### Documentation
```bash
# Generate docs
mix docs
# Open in browser
open doc/index.html
```
#### User Guides
- [Quick Start Guide](docs/QUICKSTART.md) - Get started with the most common use cases
- [User Guide](docs/USER_GUIDE.md) - Comprehensive documentation of all features
- [Logger User Guide](docs/LOGGER.md) - Comprehensive guide to ExLLM's unified logging system
- [Provider Capabilities Guide](docs/PROVIDER_CAPABILITIES.md) - How to find and update provider capabilities
## Roadmap
Visit the [GitHub repository](https://github.com/azmaveth/ex_llm) to see the detailed roadmap and progress tracking.
### Recently Completed ✅
- [x] OpenAI adapter implementation
- [x] Ollama adapter implementation
- [x] AWS Bedrock adapter with multi-provider support
- [x] Google Gemini adapter
- [x] Structured outputs via Instructor integration
- [x] Comprehensive cost tracking across all providers
- [x] Function calling support for compatible models
- [x] Request retry logic with exponential backoff
- [x] Enhanced streaming error recovery
- [x] Mock adapter for testing
- [x] Model capability discovery and recommendations
### Near-term Goals
- [ ] Vision/multimodal support for compatible models
- [ ] Embeddings API support
- [ ] Enhanced streaming with token-level callbacks
- [ ] Response caching with configurable TTL
- [ ] Fine-tuning management
- [ ] Batch API support
- [ ] Prompt template management
- [ ] Usage analytics and reporting
### Long-term Vision
- Become the go-to LLM client library for Elixir
- Support all major LLM providers
- Provide best-in-class developer experience
- Maintain comprehensive documentation
- Build a thriving ecosystem of extensions
## Contributing
We welcome contributions! Please see our contributing guidelines:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Add tests for new functionality
5. Ensure all tests pass (`mix test`)
6. Format your code (`mix format`)
7. Run linter (`mix credo`)
8. Commit your changes (`git commit -m 'feat: add amazing feature'`)
9. Push to the branch (`git push origin feature/amazing-feature`)
10. Open a Pull Request
### Commit Message Convention
We use [Conventional Commits](https://www.conventionalcommits.org/):
- `feat:` for new features
- `fix:` for bug fixes
- `docs:` for documentation changes
- `chore:` for maintenance tasks
- `test:` for test additions/changes
## Future Provider Support
ExLLM includes pre-configured model data for 49 additional providers, ready for implementation:
**Major Cloud Providers**: Azure, Vertex AI, Databricks, Sagemaker, Watsonx, Snowflake
**AI Companies**: Mistral AI, Cohere, Together AI, Replicate, Perplexity, DeepSeek, XAI
**Inference Platforms**: Fireworks AI, DeepInfra, Anyscale, Cloudflare, NScale, SambaNova
**Specialized**: AI21, NLP Cloud, Aleph Alpha, Voyage (embeddings), Assembly AI (audio)
All model configurations including pricing, context windows, and capabilities are already available in `config/models/`.
## Acknowledgments
- Built with [Req](https://github.com/wojtekmach/req) for HTTP client functionality
- Local model support via [Bumblebee](https://github.com/elixir-nx/bumblebee)
- Structured outputs via [Instructor](https://github.com/thmsmlr/instructor_ex)
- Model configuration data synced from [LiteLLM](https://github.com/BerriAI/litellm)
- Inspired by the need for a unified LLM interface in Elixir
## License
MIT License - see [LICENSE](LICENSE) for details.