# Braintrust
An unofficial Elixir client for the [Braintrust](https://braintrust.dev) AI evaluation and observability platform.
Braintrust is an end-to-end platform for evaluating, monitoring, and improving AI applications. This Hex package provides Elixir/Phoenix applications with access to Braintrust's REST API for managing projects, experiments, datasets, logs, and prompts.
## Installation
Add `braintrust` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:braintrust, "~> 0.2"}
]
end
```
## Configuration
Set your API key via environment variable:
```bash
export BRAINTRUST_API_KEY="sk-your-api-key"
```
Or configure in your application:
```elixir
# config/config.exs
config :braintrust, api_key: System.get_env("BRAINTRUST_API_KEY")
# Or at runtime
Braintrust.configure(api_key: "sk-xxx")
```
API keys can be created at [braintrust.dev/app/settings](https://www.braintrust.dev/app/settings?subroute=api-keys).
## Usage
### Projects
```elixir
# List all projects
{:ok, projects} = Braintrust.Project.list()
# Create a project
{:ok, project} = Braintrust.Project.create(%{name: "my-project"})
# Get a project by ID
{:ok, project} = Braintrust.Project.get(project_id)
# Update a project
{:ok, project} = Braintrust.Project.update(project_id, %{name: "updated-name"})
# Delete a project (soft delete)
{:ok, project} = Braintrust.Project.delete(project_id)
# Stream through projects lazily (memory efficient)
Braintrust.Project.stream(limit: 50)
|> Stream.take(100)
|> Enum.to_list()
```
### Logging Traces
Log production traces for observability:
```elixir
# Log with raw maps
{:ok, _} = Braintrust.Log.insert(project_id, [
%{
input: %{messages: [%{role: "user", content: "Hello"}]},
output: "Hi there!",
scores: %{quality: 0.9},
metadata: %{model: "gpt-4", environment: "production"},
metrics: %{latency_ms: 250, input_tokens: 50, output_tokens: 25}
}
])
# Or use Span structs for better type safety
spans = [
%Braintrust.Span{
input: %{messages: [%{role: "user", content: "Hello"}]},
output: "Hi there!",
scores: %{quality: 0.9},
metadata: %{model: "gpt-4"},
metrics: %{latency_ms: 250}
}
]
{:ok, _} = Braintrust.Log.insert(project_id, spans)
```
### Experiments
Run evaluations and track results:
```elixir
# Create an experiment
{:ok, experiment} = Braintrust.Experiment.create(%{
project_id: "proj_123",
name: "gpt4-baseline"
})
# Insert evaluation events
{:ok, _} = Braintrust.Experiment.insert(experiment.id, [
%{
input: %{messages: [%{role: "user", content: "What is 2+2?"}]},
output: "4",
expected: "4",
scores: %{accuracy: 1.0},
metadata: %{model: "gpt-4"}
}
])
# Get experiment summary
{:ok, summary} = Braintrust.Experiment.summarize(experiment.id)
# Stream through all events
Braintrust.Experiment.fetch_stream(experiment.id)
|> Stream.each(&process_event/1)
|> Stream.run()
# Add feedback to events
{:ok, _} = Braintrust.Experiment.feedback(experiment.id, [
%{id: "event_123", scores: %{human_rating: 0.9}, comment: "Good response"}
])
```
### Datasets
Manage test data for evaluations:
```elixir
# Create a dataset
{:ok, dataset} = Braintrust.Dataset.create(%{
project_id: "proj_123",
name: "test-cases",
description: "Q&A evaluation test cases"
})
# Insert test records
{:ok, _} = Braintrust.Dataset.insert(dataset.id, [
%{input: %{question: "What is 2+2?"}, expected: "4"},
%{input: %{question: "What is 3+3?"}, expected: "6", metadata: %{category: "math"}}
])
# Fetch dataset records
{:ok, result} = Braintrust.Dataset.fetch(dataset.id, limit: 100)
# Stream through all records
Braintrust.Dataset.fetch_stream(dataset.id)
|> Stream.each(&process_record/1)
|> Stream.run()
# Add feedback to records
{:ok, _} = Braintrust.Dataset.feedback(dataset.id, [
%{id: "record_123", scores: %{quality: 0.95}, comment: "Excellent test case"}
])
# Get dataset summary
{:ok, summary} = Braintrust.Dataset.summarize(dataset.id)
```
### Prompts
Version-controlled prompt management with template variables:
```elixir
# Create a prompt
{:ok, prompt} = Braintrust.Prompt.create(%{
project_id: "proj_123",
name: "customer-support",
slug: "customer-support-v1",
model: "gpt-4",
messages: [
%{role: "system", content: "You are a helpful customer support agent."},
%{role: "user", content: "{{user_input}}"}
]
})
# List prompts
{:ok, prompts} = Braintrust.Prompt.list(project_id: "proj_123")
# Get a prompt by ID
{:ok, prompt} = Braintrust.Prompt.get(prompt_id)
# Get a specific version
{:ok, prompt} = Braintrust.Prompt.get(prompt_id, version: "v2")
# Update a prompt (creates new version)
{:ok, prompt} = Braintrust.Prompt.update(prompt_id, %{
messages: [
%{role: "system", content: "Updated system prompt."},
%{role: "user", content: "{{user_input}}"}
]
})
# Stream through prompts lazily
Braintrust.Prompt.stream(project_id: "proj_123")
|> Stream.take(50)
|> Enum.to_list()
```
### Functions
Manage tools, scorers, and callable functions:
```elixir
# List all functions
{:ok, functions} = Braintrust.Function.list()
# List scorers for a specific project
{:ok, scorers} = Braintrust.Function.list(
project_id: "proj_123",
function_type: "scorer"
)
# Create a code-based scorer
{:ok, scorer} = Braintrust.Function.create(%{
project_id: "proj_123",
name: "relevance-scorer",
slug: "relevance-scorer-v1",
function_type: "scorer",
function_data: %{
type: "code",
data: %{
runtime: "node",
code: "export default async function({ input, output, expected }) {
// Scoring logic here
return { score: 0.9 };
}"
}
}
})
# Get a function by ID
{:ok, func} = Braintrust.Function.get(function_id)
# Get a specific version
{:ok, func} = Braintrust.Function.get(function_id, version: "v2")
# Update a function
{:ok, func} = Braintrust.Function.update(function_id, %{
description: "Updated relevance scorer with better accuracy"
})
# Stream through functions
Braintrust.Function.stream(function_type: "tool")
|> Stream.take(50)
|> Enum.to_list()
```
## LangChain Integration
If you're using [LangChain Elixir](https://github.com/brainlid/langchain), you can automatically log all LLM interactions to Braintrust:
```elixir
alias LangChain.Chains.LLMChain
alias LangChain.ChatModels.ChatOpenAI
alias LangChain.Message
alias Braintrust.LangChainCallbacks
{:ok, chain} =
%{llm: ChatOpenAI.new!(%{model: "gpt-4"})}
|> LLMChain.new!()
|> LLMChain.add_callback(LangChainCallbacks.handler(
project_id: "your-project-id",
metadata: %{"environment" => "production"}
))
|> LLMChain.add_message(Message.new_user!("Hello!"))
|> LLMChain.run()
```
For streaming with time-to-first-token metrics:
```elixir
|> LLMChain.add_callback(LangChainCallbacks.streaming_handler(
project_id: "your-project-id"
))
```
See `Braintrust.LangChainCallbacks` for full documentation.
### Error Handling
All API functions return `{:ok, result}` or `{:error, %Braintrust.Error{}}`:
```elixir
case Braintrust.Project.get(project_id) do
{:ok, project} ->
handle_project(project)
{:error, %Braintrust.Error{type: :not_found}} ->
handle_not_found()
{:error, %Braintrust.Error{type: :rate_limit, retry_after: ms}} ->
Process.sleep(ms)
retry()
{:error, %Braintrust.Error{type: :authentication}} ->
handle_auth_error()
{:error, %Braintrust.Error{} = error} ->
Logger.error("API error: #{error.message}")
handle_error(error)
end
```
## Features
- **Projects** - Manage AI projects containing experiments, datasets, and logs
- **Experiments** - Run evaluations and compare results across runs
- **Datasets** - Version-controlled test data with support for pinning evaluations to specific versions
- **Logging/Tracing** - Production observability with span-based tracing
- **Prompts** - Version-controlled prompt management with template variables and versioning
- **Functions** - Access to tools, scorers, and callable functions
- **Automatic Retry** - Exponential backoff for rate limits and transient errors
- **Pagination Streams** - Lazy iteration over paginated results
## API Coverage
| Resource | Endpoint | Status |
|----------|----------|--------|
| Projects | `/v1/project` | ✅ Implemented |
| Experiments | `/v1/experiment` | ✅ Implemented |
| Datasets | `/v1/dataset` | ✅ Implemented |
| Logs | `/v1/project_logs` | ✅ Implemented |
| Prompts | `/v1/prompt` | ✅ Implemented |
| Functions | `/v1/function` | ✅ Implemented |
| BTQL | `/btql` | 🚧 Planned |
## Resources
- [Braintrust Documentation](https://www.braintrust.dev/docs)
- [API Reference](https://www.braintrust.dev/docs/api-reference/introduction)
- [OpenAPI Specification](https://github.com/braintrustdata/braintrust-openapi)
## License
MIT