# ExAgent
An Elixir library for building multi-agent LLM applications. ExAgent abstracts calls to various LLM providers (OpenAI, Gemini, DeepSeek) via an extensible Protocol and orchestrates them using OTP primitives with four multi-agent design patterns.
## Features
- **Protocol-based LLM abstraction** — Swap providers without changing application code
- **Built on OTP** — Agents backed by GenServers, supervised processes, async Tasks
- **Automatic tool execution** — Define tools once, the agent loops LLM calls until complete
- **4 multi-agent patterns** — Subagents, Skills, Handoffs, Router
- **HTTP via Req** — Clean, composable HTTP with built-in JSON encoding and auth
- **Multimodal file attachments** — Send images, PDFs, and other files alongside chat messages
- **Extensible** — Add any LLM provider by implementing a single protocol
## Installation
Add `ex_agent` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:ex_agent, "~> 0.1.0"}
]
end
```
## Quick Start
```elixir
# 1. Create a provider
provider = ExAgent.Providers.OpenAI.new(api_key: System.get_env("OPENAI_API_KEY"))
# 2. Start an agent
{:ok, agent} = ExAgent.start_agent(provider: provider)
# 3. Chat
{:ok, response} = ExAgent.chat(agent, "What is Elixir?")
IO.puts(response.content)
```
## Providers
ExAgent ships with three built-in providers. Each is configured via `new/1` and automatically initializes a Req HTTP client.
### OpenAI
Supports chat and file attachments (images via `image_url` multipart format).
```elixir
provider = ExAgent.Providers.OpenAI.new(
api_key: "sk-...", # required
model: "gpt-4o", # default: "gpt-4o"
base_url: "https://api.openai.com/v1", # default
system_prompt: "You are a helpful assistant."
)
```
### Gemini
Supports chat and file attachments (images, PDFs, etc. via `inline_data` format).
```elixir
provider = ExAgent.Providers.Gemini.new(
api_key: "AIza...", # required
model: "gemini-2.0-flash", # default: "gemini-2.0-flash"
system_prompt: "Be concise."
)
```
### DeepSeek
Supports chat and tool calling. File attachments are silently ignored (DeepSeek API does not support multimodal input).
```elixir
provider = ExAgent.Providers.DeepSeek.new(
api_key: "sk-...", # required
model: "deepseek-chat", # default: "deepseek-chat"
system_prompt: "You are a coding expert."
)
```
## Core Concepts
### Message
Represents a single message in a conversation.
```elixir
{:ok, msg} = ExAgent.Message.new(role: :user, content: "Hello!")
# Supported roles: :system, :user, :assistant, :tool
```
### Tool
Defines a function the LLM can invoke, with JSON Schema parameters.
```elixir
{:ok, tool} = ExAgent.Tool.new(
name: "get_weather",
description: "Get current weather for a city",
parameters: %{
"type" => "object",
"properties" => %{
"city" => %{"type" => "string", "description" => "City name"}
},
"required" => ["city"]
},
function: fn %{"city" => city} ->
{:ok, "#{city}: 22C, sunny"}
end
)
```
### Context
Portable conversation state with message history and metadata.
```elixir
context = ExAgent.Context.new(metadata: %{session_id: "abc123"})
{:ok, msg} = ExAgent.Message.new(role: :user, content: "Hello")
context = ExAgent.Context.add_message(context, msg)
# Get the last assistant response
last = ExAgent.Context.get_last_assistant_message(context)
```
### Skill
A loadable persona with its own system prompt, tools, and activation function.
```elixir
{:ok, sql_skill} = ExAgent.Skill.new(
name: "sql_expert",
system_prompt: "You are a SQL expert. Help users write queries.",
tools: [sql_tool],
activation_fn: fn ctx ->
Enum.any?(ctx.messages, fn m ->
String.contains?(m.content, "SQL") or String.contains?(m.content, "SELECT")
end)
end
)
```
## Agent Lifecycle
Agents are GenServer processes managed by a DynamicSupervisor.
```elixir
# Start an agent with tools and skills
{:ok, agent} = ExAgent.start_agent(
provider: provider,
id: "my-agent",
tools: [weather_tool, search_tool],
skills: [sql_skill]
)
# Synchronous chat
{:ok, response} = ExAgent.chat(agent, "What's the weather in Tokyo?")
IO.puts(response.content)
# Asynchronous chat
task = ExAgent.chat_async(agent, "Tell me a story")
{:ok, response} = Task.await(task)
# Inspect conversation history
context = ExAgent.get_context(agent)
Enum.each(context.messages, fn msg ->
IO.puts("#{msg.role}: #{msg.content}")
end)
# Reset conversation
ExAgent.reset(agent)
# Stop the agent
ExAgent.stop_agent(agent)
```
## File Attachments
Send images, PDFs, and other files alongside chat messages. Files become part of the conversation context, so the LLM can reference them in follow-up messages. You can either send files inline (base64-encoded) or upload them first for better performance.
```elixir
# Attach a file by path (inline base64)
{:ok, response} = ExAgent.chat(agent, "Describe this image",
files: [%{path: "photo.jpg", mime_type: "image/jpeg"}])
# Attach raw binary data (inline base64)
image_data = File.read!("diagram.png")
{:ok, response} = ExAgent.chat(agent, "What's in this diagram?",
files: [%{data: image_data, mime_type: "image/png"}])
# Multiple files of any type
{:ok, response} = ExAgent.chat(agent, "Summarize these documents",
files: [
%{path: "report.pdf", mime_type: "application/pdf"},
%{path: "data.csv", mime_type: "text/csv"},
%{path: "notes.md", mime_type: "text/markdown"}
])
# Files persist in conversation context — the LLM remembers them
{:ok, _} = ExAgent.chat(agent, "Now focus on the second document")
```
### Supported File Types
| Type | MIME Type | OpenAI | Gemini | DeepSeek |
|------|-----------|--------|--------|----------|
| JPEG | `image/jpeg` | Yes | Yes | No |
| PNG | `image/png` | Yes | Yes | No |
| GIF | `image/gif` | Yes | Yes | No |
| WebP | `image/webp` | Yes | Yes | No |
| PDF | `application/pdf` | Yes | Yes | No |
| TXT | `text/plain` | Yes | Yes | No |
| Markdown | `text/markdown` | Yes | Yes | No |
| CSV | `text/csv` | Yes | Yes | No |
> **Note:** DeepSeek does not support multimodal input. File attachments on DeepSeek agents are silently ignored.
## File Uploads
For large files or when you want to reuse the same file across multiple conversations, upload the file first and reference it later. This avoids sending base64-encoded data with every chat request.
### Upload and Reference (OpenAI)
```elixir
provider = ExAgent.Providers.OpenAI.new(api_key: System.get_env("OPENAI_API_KEY"))
# Upload a file from disk
{:ok, ref} = ExAgent.upload_file(provider, "report.pdf", "application/pdf")
# Use the reference in chat — no base64 encoding, just a lightweight file ID
{:ok, agent} = ExAgent.start_agent(provider: provider)
{:ok, response} = ExAgent.chat(agent, "Summarize this report",
files: [%{file_ref: ref}])
# Reuse the same reference in another message
{:ok, response} = ExAgent.chat(agent, "What are the key findings?",
files: [%{file_ref: ref}])
```
### Upload and Reference (Gemini)
```elixir
provider = ExAgent.Providers.Gemini.new(api_key: System.get_env("GEMINI_API_KEY"))
# Upload a file — Gemini files expire after 48 hours
{:ok, ref} = ExAgent.upload_file(provider, "photo.jpg", "image/jpeg")
# Check if a reference has expired
ExAgent.FileRef.expired?(ref)
# Use in chat
{:ok, agent} = ExAgent.start_agent(provider: provider)
{:ok, response} = ExAgent.chat(agent, "Describe what you see",
files: [%{file_ref: ref}])
```
### Upload Raw Binary Data
```elixir
# If you already have the file contents in memory
image_bytes = File.read!("screenshot.png")
{:ok, ref} = ExAgent.upload_data(provider, image_bytes, "image/png",
filename: "screenshot.png")
```
### Mix Inline and Uploaded Files
```elixir
# You can combine both approaches in a single message
{:ok, ref} = ExAgent.upload_file(provider, "large_video.mp4", "video/mp4")
{:ok, response} = ExAgent.chat(agent, "Compare these",
files: [
%{file_ref: ref}, # uploaded reference
%{path: "small_image.jpg", mime_type: "image/jpeg"} # inline base64
])
```
## Built-in Provider Tools
Each LLM provider offers built-in tools that can be enabled via the `built_in_tools` option — either at agent creation (applies to all calls) or per-message (overrides agent default).
### Gemini
```elixir
# Google Search grounding — LLM can search the web for up-to-date info
{:ok, agent} = ExAgent.start_agent(
provider: ExAgent.Providers.Gemini.new(api_key: gemini_key),
built_in_tools: [:google_search]
)
{:ok, response} = ExAgent.chat(agent, "What happened in tech news today?")
# Code execution — LLM can write and run Python code
{:ok, response} = ExAgent.chat(agent, "Calculate fibonacci(20)",
built_in_tools: [:code_execution])
# URL context — LLM can fetch and analyze web pages
{:ok, response} = ExAgent.chat(agent, "Summarize this page",
built_in_tools: [:url_context])
# Combine multiple built-in tools
{:ok, response} = ExAgent.chat(agent, "Research and compute",
built_in_tools: [:google_search, :code_execution])
```
Available Gemini built-in tools: `:google_search`, `:code_execution`, `:url_context`
### OpenAI
```elixir
# Web search — LLM can search the web
{:ok, agent} = ExAgent.start_agent(
provider: provider,
built_in_tools: [:web_search]
)
{:ok, response} = ExAgent.chat(agent, "What are the latest Elixir releases?")
# Web search with user location for localized results
{:ok, response} = ExAgent.chat(agent, "Best restaurants nearby",
built_in_tools: [%{web_search: %{"city" => "San Francisco", "country" => "US", "region" => "California"}}])
```
Available OpenAI built-in tools: `:web_search`
### DeepSeek
```elixir
# Thinking/reasoning mode — enables chain-of-thought reasoning
{:ok, agent} = ExAgent.start_agent(
provider: ExAgent.Providers.DeepSeek.new(
api_key: deepseek_key,
model: "deepseek-reasoner"
),
built_in_tools: [:thinking]
)
{:ok, response} = ExAgent.chat(agent, "Solve this step by step: if x^2 + 3x - 10 = 0, what is x?")
```
Available DeepSeek built-in tools: `:thinking`
## Tool Calling
When you provide tools to an agent, the LLM can invoke them automatically. The agent runs a tool execution loop:
1. Sends messages + tool definitions to the LLM
2. If the LLM returns a `tool_call`, the agent executes the matching function
3. Appends the tool result as a `:tool` message
4. Calls the LLM again with the updated context
5. Repeats until the LLM returns a final text response (max 10 iterations)
```elixir
{:ok, search_tool} = ExAgent.Tool.new(
name: "web_search",
description: "Search the web for information",
parameters: %{
"type" => "object",
"properties" => %{
"query" => %{"type" => "string", "description" => "Search query"}
},
"required" => ["query"]
},
function: fn %{"query" => query} ->
# Your search implementation here
{:ok, "Results for: #{query}"}
end
)
{:ok, calc_tool} = ExAgent.Tool.new(
name: "calculator",
description: "Evaluate a math expression",
parameters: %{
"type" => "object",
"properties" => %{
"expression" => %{"type" => "string"}
},
"required" => ["expression"]
},
function: fn %{"expression" => expr} ->
{result, _} = Code.eval_string(expr)
{:ok, to_string(result)}
end
)
{:ok, agent} = ExAgent.start_agent(
provider: provider,
tools: [search_tool, calc_tool]
)
# The LLM can now decide to call these tools during conversation
{:ok, response} = ExAgent.chat(agent, "What is 42 * 37?")
```
## Multi-Agent Patterns
### 1. Subagents (Centralized Orchestration)
A main orchestrator agent delegates work to specialized subagents. Each subagent runs in isolation with a fresh context — no state leaks between calls.
```elixir
alias ExAgent.Patterns.Subagents
# Define specialized subagent specs
researcher = %{
name: "researcher",
description: "Research a topic and return findings",
provider: ExAgent.Providers.Gemini.new(api_key: gemini_key),
system_prompt: "You are a research specialist. Provide detailed findings.",
tools: []
}
coder = %{
name: "coder",
description: "Write code based on specifications",
provider: ExAgent.Providers.OpenAI.new(api_key: openai_key),
system_prompt: "You are an expert programmer. Write clean, tested code.",
tools: []
}
# Convert subagent specs into tools for the orchestrator
orchestrator_tools = Subagents.build_orchestrator_tools([researcher, coder])
# The orchestrator uses these as regular tools — when the LLM calls
# "researcher" or "coder", it spawns an ephemeral subagent call
{:ok, orchestrator} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: openai_key,
system_prompt: "You orchestrate tasks. Use the researcher for facts and the coder for code."
),
tools: orchestrator_tools
)
{:ok, response} = ExAgent.chat(orchestrator, "Research Elixir GenServers and write an example")
# You can also invoke subagents directly
{:ok, result} = Subagents.invoke_subagent(researcher, "Explain OTP supervision trees")
# Or invoke multiple in parallel
results = Subagents.invoke_subagents_parallel([
{researcher, "What is GenServer?"},
{coder, "Write a GenServer example"}
])
# => [{"researcher", {:ok, "GenServer is..."}}, {"coder", {:ok, "defmodule..."}}]
```
### 2. Skills (Progressive Disclosure)
A single agent dynamically loads specialized system prompts and tools based on conversation context. Skills are evaluated before each LLM call.
```elixir
# Define skills with activation functions
{:ok, sql_skill} = ExAgent.Skill.new(
name: "sql_expert",
system_prompt: "You are a SQL expert. Help users write and optimize queries.",
tools: [sql_execute_tool],
activation_fn: fn ctx ->
ctx.messages
|> Enum.any?(fn m ->
String.match?(m.content, ~r/SQL|SELECT|INSERT|UPDATE|DELETE|database/i)
end)
end
)
{:ok, python_skill} = ExAgent.Skill.new(
name: "python_expert",
system_prompt: "You are a Python expert. Write idiomatic Python code.",
tools: [python_run_tool],
activation_fn: fn ctx ->
ctx.messages
|> Enum.any?(fn m -> String.contains?(m.content, "Python") end)
end
)
# Start agent with skills — it begins as a generalist
{:ok, agent} = ExAgent.start_agent(
provider: provider,
skills: [sql_skill, python_skill]
)
# When the user mentions SQL, the sql_expert skill activates automatically
{:ok, response} = ExAgent.chat(agent, "Help me write a SQL query to find active users")
# => Agent now uses the sql_expert system prompt and tools
# Skills can also be loaded dynamically at runtime
{:ok, new_skill} = ExAgent.Skill.new(name: "devops", system_prompt: "You are a DevOps expert.")
ExAgent.Agent.load_skill(agent, new_skill)
```
### 3. Handoffs (State-Driven Transitions)
The active agent changes dynamically. When the LLM invokes a handoff tool, control transfers to a different agent. The caller receives a `{:handoff, target, context}` tuple and decides where to route subsequent messages.
```elixir
alias ExAgent.Patterns.Handoff
# Start specialized agents
{:ok, sales_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: key,
system_prompt: "You are a sales specialist."
)
)
{:ok, support_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: key,
system_prompt: "You are a technical support specialist."
)
)
# Build handoff tools
handoff_to_support = Handoff.build_handoff_tool(
"support",
support_agent,
"Transfer to technical support when the user has a technical issue"
)
handoff_to_sales = Handoff.build_handoff_tool(
"sales",
sales_agent,
"Transfer to sales when the user wants to buy something"
)
# Start a triage agent with handoff tools
{:ok, triage_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: key,
system_prompt: "You are a triage agent. Route users to the right department."
),
tools: [handoff_to_support, handoff_to_sales]
)
# When the LLM decides to hand off, you get a handoff tuple
case ExAgent.chat(triage_agent, "My app keeps crashing") do
{:ok, response} ->
# Normal response — agent handled it directly
IO.puts("Normal response — agent handled it directly")
IO.puts(response.content)
{:handoff, target_pid, context} ->
# Transfer context and continue with the new agent
ExAgent.handoff(target_pid, context)
{:ok, response} = ExAgent.chat(target_pid, "My app keeps crashing")
IO.puts("Transfer context and continue with the new agent")
IO.puts(response.content)
end
```
### 4. Router (Parallel Dispatch & Synthesis)
Classifies input, dispatches to multiple specialized agents in parallel, and synthesizes results into a single response.
```elixir
alias ExAgent.Patterns.Router
# Start specialized agents
{:ok, code_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: key,
system_prompt: "Analyze code quality and suggest improvements."
)
)
{:ok, security_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.OpenAI.new(
api_key: key,
system_prompt: "Analyze code for security vulnerabilities."
)
)
{:ok, perf_agent} = ExAgent.start_agent(
provider: ExAgent.Providers.Gemini.new(
api_key: gemini_key,
system_prompt: "Analyze code for performance issues."
)
)
# Define routes with match functions
routes = [
%{name: "code_quality", agent: code_agent, match_fn: fn _ -> true end},
%{name: "security", agent: security_agent, match_fn: &String.contains?(&1, "security")},
%{name: "performance", agent: perf_agent, match_fn: &String.contains?(&1, "performance")}
]
# Route dispatches to all matching agents in parallel
{:ok, result} = ExAgent.route(
"Review this code for security and performance issues: def fetch(url), do: HTTPoison.get!(url)",
routes: routes,
timeout: 30_000
)
IO.puts(result)
# ## code_quality
# The function lacks error handling...
#
# ## security
# Using get! will raise on HTTP errors...
#
# ## performance
# Consider connection pooling...
# Custom synthesizer
{:ok, result} = ExAgent.route("analyze this code",
routes: routes,
synthesizer: fn _input, results ->
results
|> Enum.map(fn {name, content} -> "**#{name}**: #{content}" end)
|> Enum.join("\n\n")
end
)
```
## Adding a Custom Provider
Any LLM can be integrated by defining a struct and implementing the `ExAgent.LlmProvider` protocol:
```elixir
defmodule MyApp.Providers.Anthropic do
@moduledoc "Custom Anthropic Claude provider."
defstruct [
:api_key, :req,
model: "claude-sonnet-4-20250514",
base_url: "https://api.anthropic.com/v1",
system_prompt: nil,
tools: []
]
def new(opts) do
provider = struct!(__MODULE__, opts)
%{provider | req: Req.new(
base_url: provider.base_url,
headers: [
{"x-api-key", provider.api_key},
{"anthropic-version", "2023-06-01"}
]
)}
end
defimpl ExAgent.LlmProvider do
def chat(provider, messages, _opts) do
body = %{
"model" => provider.model,
"max_tokens" => 1024,
"messages" => Enum.map(messages, fn msg ->
%{"role" => to_string(msg.role), "content" => msg.content}
end)
}
case Req.post(provider.req, url: "/messages", json: body) do
{:ok, %Req.Response{status: 200, body: %{"content" => [%{"text" => text} | _]}}} ->
{:ok, %ExAgent.Message{role: :assistant, content: text}}
{:ok, %Req.Response{status: status, body: body}} ->
{:error, {status, body}}
{:error, reason} ->
{:error, reason}
end
end
end
end
# Use it like any other provider
provider = MyApp.Providers.Anthropic.new(api_key: "sk-ant-...")
{:ok, agent} = ExAgent.start_agent(provider: provider)
{:ok, response} = ExAgent.chat(agent, "Hello Claude!")
```
## Architecture
### Supervision Tree
```
Application (ex_agent)
|
ExAgent.AgentSupervisor (:one_for_one)
|
+-- ExAgent.AgentDynamicSupervisor (:one_for_one)
| |
| +-- ExAgent.Agent (id: "orchestrator")
| +-- ExAgent.Agent (id: "coder")
| +-- ExAgent.Agent (id: "reviewer")
| +-- ... (any runtime agents)
|
+-- ExAgent.TaskSupervisor
|
+-- Task (async chat calls)
+-- Task (parallel subagent invocations)
+-- Task (router parallel dispatch)
```
### Design Decisions
- **Protocol dispatch** — Provider structs implement `ExAgent.LlmProvider`, enabling compile-time polymorphism
- **Thin protocol impls** — Protocol implementations delegate to service modules under `services/`, keeping HTTP logic separate
- **Tool loop in GenServer** — The `handle_call({:chat, ...})` contains the tool execution loop, processing one turn at a time to prevent race conditions on context
- **Subagents bypass GenServer** — Ephemeral stateless calls use `LlmProvider.chat/3` directly in supervised Tasks
- **Handoff returns to caller** — Keeps agents decoupled; the caller decides routing after a handoff
- **Router is a plain module** — Stateless classify-dispatch-synthesize flow needs no GenServer
- **All patterns share one Agent GenServer** — Patterns augment behavior through state and tools, not separate process types
### Project Structure
```
lib/
ex_agent.ex # Public API facade
ex_agent/
llm_provider.ex # LlmProvider protocol
file_uploader.ex # FileUploader protocol
file_ref.ex # %FileRef{} struct (uploaded file reference)
message.ex # %Message{} struct
tool.ex # %Tool{} struct
context.ex # %Context{} struct
skill.ex # %Skill{} struct
agent.ex # Agent GenServer
supervisor.ex # AgentSupervisor
dynamic_supervisor.ex # AgentDynamicSupervisor
providers/
openai.ex # OpenAI provider + LlmProvider + FileUploader
gemini.ex # Gemini provider + LlmProvider + FileUploader
deep_seek.ex # DeepSeek provider + LlmProvider
services/
openai_service.ex # OpenAI chat HTTP calls via Req
openai_upload_service.ex # OpenAI file upload (POST /v1/files)
gemini_service.ex # Gemini chat HTTP calls via Req
gemini_upload_service.ex # Gemini file upload (Files API)
deep_seek_service.ex # DeepSeek HTTP calls via Req
patterns/
subagents.ex # Centralized orchestration
skills.ex # Progressive disclosure
handoff.ex # State-driven transitions
router.ex # Parallel dispatch & synthesis
```
## License
Apache-2.0