# Candil
LLM inference and model management for Elixir. Run local models via llama.cpp or remote models via OpenAI-compatible APIs.
## Installation
```elixir
def deps do
[
{:candil, "~> 1.0"}
]
end
```
## Dependencies
Candil requires:
- `:apero` - System utilities (included automatically via path in dev)
- `:arrea` - Parallel execution (included automatically via path in dev)
- `:jason` - JSON encoding/decoding
- `:req` - HTTP client
## Configuration
### Engines
An engine represents a local `llama-server` binary that serves one model at a time.
```elixir
# Precompiled binary (auto-downloaded)
engine = %Candil.Engine{
alias: :llama_server,
use_precompiled: true,
precompiled_version: :latest,
host: "127.0.0.1",
port: 8080,
start_args: ["--n-gpu-layers", "35"]
}
# Custom binary path
engine = %Candil.Engine{
alias: :llama_server,
binary_dir: "/usr/local/bin",
use_precompiled: false,
host: "127.0.0.1",
port: 8080
}
Candil.Config.register_engine(engine)
```
### Models
#### Local Model
```elixir
model = %Candil.Model{
alias: :llama3,
type: :local,
model_dir: "/models",
filename: "llama-3-8b-q4_k_m.gguf",
download_url: "https://huggingface.co/.../llama-3-8b-q4_k_m.gguf",
context_size: 8192,
engine: :llama_server,
usage: [:chat, :completion],
model_args: ["--n-gpu-layers", "35"]
}
Candil.Config.register_model(model)
```
#### Remote Model
```elixir
model = %Candil.Model{
alias: :gpt4o,
type: :remote,
name: "gpt-4o",
context_size: 128_000,
provider: :openai,
usage: [:chat, :completion, :embeddings]
}
Candil.Config.register_model(model)
```
### Providers
#### OpenAI
```elixir
provider = %Candil.Provider{
alias: :openai,
type: :openai,
base_url: "https://api.openai.com/v1",
api_key: System.get_env("OPENAI_API_KEY")
}
Candil.Config.register_provider(provider)
```
#### Anthropic
```elixir
provider = %Candil.Provider{
alias: :anthropic,
type: :anthropic,
base_url: "https://api.anthropic.com",
api_key: System.get_env("ANTHROPIC_API_KEY")
}
Candil.Config.register_provider(provider)
```
#### Ollama
```elixir
provider = %Candil.Provider{
alias: :ollama,
type: :ollama,
base_url: "http://localhost:11434"
}
Candil.Config.register_provider(provider)
```
#### OpenAI-Compatible (Groq, LM Studio, etc.)
```elixir
provider = %Candil.Provider{
alias: :groq,
type: :openai_compatible,
base_url: "https://api.groq.com/openai",
api_key: System.get_env("GROQ_API_KEY")
}
Candil.Config.register_provider(provider)
```
## Usage
### Local Model (llama.cpp)
```elixir
# Download binary (automatic if use_precompiled: true)
:ok = Candil.download_engine(engine)
# Download model
{:ok, _path} = Candil.download_model(model)
# Start engine
{:ok, pid} = Candil.start_engine(engine, model)
# Run inference
{:ok, response} = Candil.chat(:llama3, [
%{role: "user", content: "Hello!"}
])
IO.puts(response.content)
# Stop when done
:ok = Candil.stop_engine(:llama3)
```
### Remote Model (OpenAI)
```elixir
# Run inference directly
{:ok, response} = Candil.chat(model, provider, [
%{role: "user", content: "Hello!"}
])
IO.puts(response.content)
```
### Streaming Responses
```elixir
# Local streaming
Candil.stream(:llama3, [
%{role: "user", content: "Write a story"}
], fn chunk ->
IO.write(chunk.content)
end)
# Remote streaming
Candil.stream(model, provider, [
%{role: "user", content: "Write a story"}
], fn chunk ->
IO.write(chunk.content)
end)
```
### Embeddings
```elixir
# Local embeddings (engine must be running and model must support :embeddings)
{:ok, embeddings} = Candil.embed(:llama3, ["Hello world", "How are you?"])
# Remote embeddings
{:ok, embeddings} = Candil.embed(model, provider, ["Hello world", "How are you?"])
```
### Conversation Management
```elixir
conv = Candil.Conversation.new(
model: :llama3,
system: "You are a helpful assistant.",
max_context_tokens: 4096
)
{:ok, conv, response} = Candil.Conversation.chat(conv, "What is Elixir?")
IO.puts(response.content)
{:ok, conv, response} = Candil.Conversation.chat(conv, "Give me a code example.")
IO.puts(response.content)
```
## Architecture
- **Candil.Llm** - Main entry point for all LLM operations
- **Candil.Engine** - Manages local llama-server processes
- **Candil.Engine.Server** - GenServer wrapping the llama-server OS process
- **Candil.Inference** - Handles chat completions and embeddings
- **Candil.Stream** - SSE streaming support
- **Candil.Provider** - Remote API provider abstraction (OpenAI, Anthropic, Ollama)
- **Candil.Model** - Model definitions (local or remote)
- **Candil.Config** - ETS-based registry for engines, models, and providers
- **Candil.Detector** - OS/GPU detection for binary selection
- **Candil.Installer** - Download and extraction utilities
- **Candil.Conversation** - Conversation history with context window management
- **Candil.RequestBuilder** - Request body builders for all provider APIs
## License
MIT