# tripswitch-ex
[](https://github.com/tripswitch-dev/tripswitch-ex/actions/workflows/ci.yml)
[](https://hex.pm/packages/tripswitch_ex)
[](LICENSE)
Official Elixir SDK for [Tripswitch](https://tripswitch.dev) — a circuit breaker management service.
This SDK conforms to the [Tripswitch SDK Contract v0.2](https://tripswitch.dev/docs/sdk-contract).
## Features
- **Real-time state sync** via Server-Sent Events (SSE)
- **Automatic sample reporting** with buffered, batched uploads
- **Fail-open by default** — your app stays available even if Tripswitch is unreachable
- **OTP-native** — supervised GenServer tree, safe for concurrent use
- **Graceful shutdown** — `terminate/2` flushes buffered samples before the process exits
## Installation
Add `:tripswitch_ex` to your dependencies in `mix.exs`:
```elixir
def deps do
[
{:tripswitch_ex, "~> 0.1"}
]
end
```
**Requires Elixir ~> 1.17 and Erlang/OTP ~> 26.**
## Authentication
Tripswitch uses a two-tier authentication model.
### Runtime credentials (SDK)
For SDK initialization, you need two credentials from **Project Settings → SDK Keys**:
| Credential | Prefix | Purpose |
|------------|--------|---------|
| **Project Key** | `eb_pk_` | SSE connection and state reads |
| **Ingest Secret** | 64-char hex | HMAC-signed sample ingestion |
### Admin credentials (management API)
For management and automation tasks, use an **Admin Key** from **Organization Settings → Admin Keys**:
| Credential | Prefix | Purpose |
|------------|--------|---------|
| **Admin Key** | `eb_admin_` | Org-scoped management operations |
Admin keys are used with the [Admin client](#admin-client) only — not for runtime SDK usage.
## Quick Start
Start the client under your application's supervision tree:
```elixir
# lib/my_app/application.ex
defmodule MyApp.Application do
use Application
def start(_type, _args) do
children = [
{Tripswitch.Client,
project_id: System.fetch_env!("TRIPSWITCH_PROJECT_ID"),
api_key: System.fetch_env!("TRIPSWITCH_API_KEY"),
ingest_secret: System.fetch_env!("TRIPSWITCH_INGEST_SECRET"),
name: MyApp.Tripswitch,
on_state_change: &MyApp.Breakers.on_change/3}
]
Supervisor.start_link(children, strategy: :one_for_one)
end
end
```
Then wrap operations with `Tripswitch.execute/3`:
```elixir
result =
Tripswitch.execute(MyApp.Tripswitch, fn -> call_payment_api() end,
breakers: ["payment-service"],
router: "checkout-router",
metrics: %{"latency_ms" => :latency},
tags: %{"region" => "us-east-1"}
)
case result do
{:error, :breaker_open} ->
# Circuit is open — return a cached or degraded response
{:ok, cached_response()}
{:error, reason} ->
# The task itself returned an error
{:error, reason}
response ->
# Task succeeded
{:ok, response}
end
```
## Client Options
Pass these as keyword arguments to `Tripswitch.Client.start_link/1` (or inline in a supervision spec):
| Option | Description | Default |
|--------|-------------|---------|
| `:project_id` | Project ID — **required** | — |
| `:api_key` | Project key (`eb_pk_`) for SSE authentication | `nil` |
| `:ingest_secret` | 64-char hex secret for HMAC-signed sample reporting | `nil` |
| `:name` | Atom used to identify this client instance | `Tripswitch.Client` |
| `:fail_open` | Allow traffic when Tripswitch is unreachable | `true` |
| `:base_url` | Override the API endpoint | `https://api.tripswitch.dev` |
| `:on_state_change` | `(name, from_state, to_state -> any)` callback on breaker transitions | `nil` |
| `:global_tags` | `%{String.t() => String.t()}` merged into every sample | `%{}` |
| `:meta_sync_ms` | Metadata refresh interval in milliseconds. Set to `0` to disable. | `30_000` |
### State change callback
```elixir
Tripswitch.Client.start_link(
project_id: "proj_...",
api_key: "eb_pk_...",
name: MyApp.Tripswitch,
on_state_change: fn name, from, to ->
Logger.warning("breaker #{name}: #{from} → #{to}")
MyApp.Metrics.increment("tripswitch.transition", tags: [breaker: name, to: to])
end
)
```
## `execute/3`
```elixir
Tripswitch.execute(client, task, opts \\ [])
```
Checks breaker state, runs `task`, and reports samples — all in one call. Returns the task's return value directly, or `{:error, :breaker_open}` if gated.
Exceptions inside `task` are caught, counted as failures, and reraised after samples are flushed.
### Options
| Option | Description |
|--------|-------------|
| `:breakers` | List of breaker names to check before executing |
| `:breaker_selector` | `([breaker_meta] -> [name])` — dynamically select breakers from cached metadata |
| `:router` | Router ID for sample reporting |
| `:router_selector` | `([router_meta] -> router_id)` — dynamically select a router from cached metadata |
| `:metrics` | `%{"name" => :latency \| number \| (result -> number)}` |
| `:deferred_metrics` | `(result -> %{"name" => number})` — extract metrics from the task result |
| `:error_evaluator` | `(result -> boolean)` — return `true` if the result is a failure. Defaults to matching `{:error, _}` tuples. |
| `:trace_id` | String propagated on every sample |
| `:tags` | `%{String.t() => String.t()}` merged into every sample for this call |
### Error classification
Every sample includes an `ok` field indicating success or failure. By default, `{:error, _}` tuples are failures and everything else is success. Exceptions are always failures.
Override with `:error_evaluator`:
```elixir
# Only count HTTP 5xx as failures; 4xx are expected
Tripswitch.execute(client, fn -> api_call() end,
error_evaluator: fn
{:error, %{status: s}} when s >= 500 -> true
{:error, _} -> false
_ -> false
end
)
```
### Metrics
```elixir
Tripswitch.execute(client, fn -> downstream() end,
router: "my-router",
metrics: %{
# Auto-computed task duration in milliseconds
"latency_ms" => :latency,
# Static numeric value
"batch_size" => 128,
# Computed from the task result
"response_bytes" => fn {:ok, body} -> byte_size(body) end
}
)
```
Use `:deferred_metrics` when the interesting values come from the result — for example, token counts from an LLM response:
```elixir
Tripswitch.execute(client, fn -> Anthropic.complete(prompt) end,
breakers: ["anthropic-spend"],
router: "llm-router",
metrics: %{"latency_ms" => :latency},
deferred_metrics: fn {:ok, response} ->
%{
"prompt_tokens" => response.usage.prompt_tokens,
"completion_tokens" => response.usage.completion_tokens
}
end
)
```
### Dynamic selection
Use `:breaker_selector` and `:router_selector` to choose at runtime based on cached metadata:
```elixir
# Gate on breakers matching a metadata property
Tripswitch.execute(client, fn -> task() end,
breaker_selector: fn breakers ->
breakers
|> Enum.filter(&(&1["metadata"]["region"] == "us-east-1"))
|> Enum.map(& &1["name"])
end
)
# Route to a router matching a metadata property
Tripswitch.execute(client, fn -> task() end,
router_selector: fn routers ->
case Enum.find(routers, &(&1["metadata"]["env"] == "production")) do
nil -> nil
router -> router["id"]
end
end,
metrics: %{"latency_ms" => :latency}
)
```
## Other runtime functions
```elixir
# Send a sample directly (for async workflows or fire-and-forget reporting)
Tripswitch.report(client, %{
router_id: "my-router",
metric: "queue_depth",
value: 42.0,
ok: true,
tags: %{"worker" => "processor-1"}
})
# Inspect a single breaker's current state (returns nil if not yet known)
%{name: name, state: state, allow_rate: rate} =
Tripswitch.get_state(client, "payment-service")
# All known states as a map keyed by breaker name
all = Tripswitch.get_all_states(client)
# Convenience predicate (true only for fully open, not half-open)
if Tripswitch.breaker_open?(client, "payment-service") do
serve_cached_response()
end
# SDK health diagnostics
stats = Tripswitch.stats(client)
# %{
# sse_connected: true,
# sse_reconnects: 0,
# last_sse_event: ~U[2024-01-15 12:00:00Z],
# cached_breakers: 4,
# buffer_size: 0,
# dropped_samples: 0,
# flush_failures: 0,
# last_successful_flush: ~U[2024-01-15 12:00:00Z]
# }
```
## Circuit breaker states
| State | Behavior |
|-------|----------|
| `"closed"` | All requests allowed, results reported |
| `"open"` | All requests return `{:error, :breaker_open}` immediately |
| `"half_open"` | Requests probabilistically allowed based on `allow_rate` (e.g., `0.2` = 20% allowed) |
## How it works
1. **State sync** — The client opens an SSE connection to Tripswitch and keeps a local cache of breaker states, updated in real time. No network call is made on each `execute/3`.
2. **Gate check** — Before running `task`, the SDK checks the local cache. Open breakers block immediately; half-open breakers use a local random draw against `allow_rate`.
3. **Sample reporting** — Results are buffered and flushed in batches of up to 500 samples (or every 15 seconds). Batches are gzip-compressed and HMAC-signed.
4. **Graceful degradation** — If Tripswitch is unreachable, the client fails open by default (unknown breaker state = closed). Samples are retried up to 3 times with backoff before being dropped.
5. **Clean shutdown** — When the OTP supervisor stops the client, `terminate/2` flushes any buffered samples synchronously before the process exits.
## Admin client
`Tripswitch.Admin` is a stateless client for management and automation. It does not require a running supervision tree.
```elixir
client = Tripswitch.Admin.new(api_key: "eb_admin_...")
# Projects
{:ok, projects} = Tripswitch.Admin.list_projects(client)
{:ok, project} = Tripswitch.Admin.get_project(client, "proj_abc123")
{:ok, project} = Tripswitch.Admin.create_project(client, %{workspace_id: "ws_...", name: "prod-payments"})
{:ok, project} = Tripswitch.Admin.update_project(client, "proj_abc123", %{name: "prod-payments-v2"})
# delete_project requires confirm_name: to prevent accidental deletion
:ok = Tripswitch.Admin.delete_project(client, "proj_abc123", confirm_name: "prod-payments")
# Breakers
{:ok, breakers} = Tripswitch.Admin.list_breakers(client, "proj_abc123")
{:ok, breaker} = Tripswitch.Admin.create_breaker(client, "proj_abc123", %{
name: "api-latency",
metric: "latency_ms",
kind: "p95",
op: "gt",
threshold: 500.0
})
{:ok, states} = Tripswitch.Admin.batch_get_breaker_states(client, "proj_abc123", %{
router_id: "router_..."
})
# Routers
{:ok, routers} = Tripswitch.Admin.list_routers(client, "proj_abc123")
:ok = Tripswitch.Admin.link_breaker(client, "proj_abc123", "router_...", "breaker_...")
:ok = Tripswitch.Admin.unlink_breaker(client, "proj_abc123", "router_...", "breaker_...")
# Events
{:ok, %{events: events, next_cursor: cursor}} =
Tripswitch.Admin.list_events(client, "proj_abc123", limit: 50)
# Project keys
{:ok, result} = Tripswitch.Admin.create_project_key(client, "proj_abc123", %{name: "ci-key"})
# result["key"] contains the full eb_pk_... value — store it, it won't be shown again
```
### Error handling
All admin functions return `{:ok, result}` or `{:error, %Tripswitch.Admin.Error{}}`. Transport failures return `{:error, exception}`.
```elixir
case Tripswitch.Admin.get_breaker(client, project_id, breaker_id) do
{:ok, breaker} ->
breaker
{:error, %Tripswitch.Admin.Error{} = err} ->
if Tripswitch.Admin.Error.not_found?(err) do
nil
else
raise "unexpected error: #{err.message} (status #{err.status})"
end
{:error, reason} ->
raise "transport error: #{inspect(reason)}"
end
```
Available predicates on `Tripswitch.Admin.Error`:
| Function | Status |
|----------|--------|
| `not_found?/1` | 404 |
| `unauthorized?/1` | 401 |
| `forbidden?/1` | 403 |
| `unprocessable?/1` | 422 |
| `rate_limited?/1` | 429 |
| `server_error?/1` | 5xx |
## Contributing
Contributions are welcome — please open an issue or pull request on [GitHub](https://github.com/tripswitch-dev/tripswitch-ex).
## License
[Apache License 2.0](LICENSE)