# Usage & Billing
## Overview
ReqLLM provides normalized usage tracking and best-effort cost calculation for API requests. Every response includes usage data that works consistently across providers, with detailed breakdowns for tokens, tools, and images when the provider exposes enough information.
## Pricing Policy
ReqLLM currently targets **"some assistance, no guarantees"** for pricing.
In practice, that means:
- `response.usage` is intended to be useful for product analytics, tenant attribution, dashboards, and rough billing estimates
- token, tool, image, and caching costs are calculated from provider usage data plus model pricing metadata when those inputs exist
- the resulting USD totals are not guaranteed to match provider invoices exactly
When exact billing matters, treat ReqLLM usage as a helpful estimate and reconcile against provider-side reporting. For the full contract, known gaps, and production guidance, see the [Pricing Policy](pricing-policy.md) guide.
## The Usage Structure
Every `ReqLLM.Response` includes a `usage` map with normalized metrics:
```elixir
{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello")
response.usage
#=> %{
# # Token counts
# input_tokens: 8,
# output_tokens: 12,
# total_tokens: 20,
#
# # Cost summary (USD)
# input_cost: 0.00024,
# output_cost: 0.00036,
# total_cost: 0.0006,
#
# # Detailed cost breakdown
# cost: %{
# tokens: 0.0006,
# tools: 0.0,
# images: 0.0,
# total: 0.0006
# }
# }
```
## Token Usage
### Standard Tokens
All providers report basic token counts:
| Field | Description |
|-------|-------------|
| `input_tokens` | Tokens in the request (prompt, context, tools) |
| `output_tokens` | Tokens generated by the model |
| `total_tokens` | Sum of input and output tokens |
### Reasoning Tokens
For reasoning models (OpenAI o1/o3/gpt-5, Anthropic extended thinking, Google thinking):
```elixir
{:ok, response} = ReqLLM.generate_text("openai:o3-mini", prompt)
response.usage.reasoning_tokens
#=> 1250 # Tokens used for internal reasoning
```
The `reasoning_tokens` field tracks tokens used for chain-of-thought reasoning. These may be billed differently than standard tokens depending on the provider.
### Cached Tokens
For providers that support prompt caching (Anthropic, OpenAI):
```elixir
response.usage.cached_tokens
#=> 500 # Input tokens served from cache
response.usage.cache_creation_tokens
#=> 0 # Tokens used to create new cache entries
```
Cached tokens are typically billed at a reduced rate. See [Anthropic Prompt Caching](anthropic.md#anthropic_prompt_cache) for details.
## Tool Usage
When using tools like web search, usage is tracked in `tool_usage`:
```elixir
response.usage.tool_usage
#=> %{
# web_search: %{count: 2, unit: "call"}
# }
```
### Web Search
Each provider has slightly different web search tracking:
| Provider | Unit | Notes |
|----------|------|-------|
| Anthropic | `"call"` | $10 per 1,000 searches |
| OpenAI | `"call"` | Responses API models only |
| xAI | `"call"` or `"source"` | Varies by response format |
| Google | `"query"` | Grounding queries |
**Anthropic Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
"anthropic:claude-sonnet-4-5",
"What's happening in AI today?",
provider_options: [web_search: %{max_uses: 5}]
)
response.usage.tool_usage.web_search
#=> %{count: 3, unit: "call"}
```
**xAI Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
"xai:grok-4-1-fast-reasoning",
"Latest tech news",
xai_tools: [%{type: "web_search"}]
)
response.usage.tool_usage.web_search
#=> %{count: 5, unit: "call"}
```
**Google Grounding Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
"google:gemini-3-flash-preview",
"Current stock market trends",
provider_options: [google_grounding: %{enable: true}]
)
response.usage.tool_usage.web_search
#=> %{count: 2, unit: "query"}
```
## Image Usage
For image generation, usage is tracked in `image_usage`:
```elixir
{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt)
response.usage.image_usage
#=> %{
# generated: %{count: 1, size_class: "1024x1024"}
# }
```
### Size Classes
Image costs vary by resolution:
| Provider | Size Classes |
|----------|-------------|
| OpenAI GPT Image | `"1024x1024"`, `"1536x1024"`, `"1024x1536"`, `"auto"` |
| OpenAI DALL-E 3 | `"1024x1024"`, `"1792x1024"`, `"1024x1792"` |
| Google | Based on aspect ratio |
### Multiple Images
```elixir
{:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3)
response.usage.image_usage.generated
#=> %{count: 3, size_class: "1024x1024"}
```
## Cost Breakdown
The `cost` map provides a detailed breakdown by category:
```elixir
response.usage.cost
#=> %{
# tokens: 0.001, # Token-based costs (input + output)
# tools: 0.02, # Web search and tool costs
# images: 0.04, # Image generation costs
# total: 0.061, # Sum of all costs
# line_items: [...] # Per-component details
# }
```
### Line Items
For detailed billing analysis, `line_items` provides per-component costs:
```elixir
response.usage.cost.line_items
#=> [
# %{component: "token.input", cost: 0.0003, quantity: 100},
# %{component: "token.output", cost: 0.0007, quantity: 50},
# %{component: "tool.web_search", cost: 0.02, quantity: 2}
# ]
```
## Provider-Specific Notes
### Anthropic
- **Web search**: $10 per 1,000 searches
- **Prompt caching**: Reduced rates for cached tokens
- **Extended thinking**: Reasoning tokens tracked separately
### OpenAI
- **Responses API**: Web search available for o1, o3, gpt-5 models
- **Chat Completions API**: No built-in web search
- **Image generation**: Costs vary by model and size
### xAI
- **Web search**: Via `xai_tools` option
- **Deprecated**: `live_search` is no longer supported
- **Units**: May report as `"call"` or `"source"`
### Google
- **Grounding**: Search via `google_grounding` option
- **Units**: Reports as `"query"`
- **Image generation**: Gemini image models supported
## Known Limits
ReqLLM does not currently guarantee support for every provider billing surface. In particular:
- realtime audio/text billing is not modeled yet
- video generation billing is not modeled yet
- account-specific discounts, credits, taxes, and regional pricing are outside the public contract
## Telemetry
ReqLLM now emits three telemetry families:
- `[:req_llm, :request, :start | :stop | :exception]` for lifecycle timing, request and response summaries, usage, and standardized reasoning metadata
- `[:req_llm, :reasoning, :start | :update | :stop]` for provider-neutral thinking and reasoning milestones
- `[:req_llm, :token_usage]` for backwards-compatible token and cost tracking
For billing and tenant attribution, use `[:req_llm, :request, :stop]` as the source of truth. It includes duration in measurements plus `request_id`, `usage`, `finish_reason`, and normalized `reasoning` metadata in the event metadata. The token usage event remains useful if you only want token and cost totals.
When you audit reasoning-heavy workloads, prefer the normalized `reasoning` snapshot on the request lifecycle events over raw provider payloads. It captures both the originally requested reasoning settings and the effective translated request, so you can see when a provider rewrites or disables a reasoning configuration before you attribute cost or behavior to a tenant.
```elixir
:telemetry.attach_many(
"my-req-llm-billing",
[
[:req_llm, :request, :stop],
[:req_llm, :request, :exception],
[:req_llm, :token_usage]
],
fn event, measurements, metadata, _config ->
case event do
[:req_llm, :request, :stop] ->
duration_ms = System.convert_time_unit(measurements.duration, :native, :millisecond)
IO.inspect(
%{
request_id: metadata.request_id,
duration_ms: duration_ms,
finish_reason: metadata.finish_reason,
usage: metadata.usage,
reasoning: metadata.reasoning
},
label: "Request"
)
[:req_llm, :request, :exception] ->
IO.inspect(metadata, label: "Failed request")
[:req_llm, :token_usage] ->
IO.inspect(%{measurements: measurements, metadata: metadata}, label: "Usage")
end
end,
nil
)
```
`[:req_llm, :token_usage]` remains available on every request, including streaming:
```elixir
:telemetry.attach(
"my-usage-handler",
[:req_llm, :token_usage],
fn _event, measurements, metadata, _config ->
IO.inspect(measurements, label: "Usage")
IO.inspect(metadata, label: "Metadata")
end,
nil
)
```
Event measurements include:
- `input_tokens`, `output_tokens`, `total_tokens`
- `input_cost`, `output_cost`, `total_cost`
- `reasoning_tokens` (when applicable)
See the [Telemetry Guide](telemetry.md) for the full event contract, reasoning lifecycle, milestone semantics, and payload capture options.
## Example: Complete Usage Tracking
```elixir
defmodule UsageTracker do
def track_request(model, prompt, opts \\ []) do
{duration_us, result} = :timer.tc(fn ->
ReqLLM.generate_text(model, prompt, opts)
end)
case result do
{:ok, response} ->
usage = response.usage
IO.puts("""
Request completed in #{duration_us / 1000}ms
Tokens:
Input: #{usage.input_tokens}
Output: #{usage.output_tokens}
Total: #{usage.total_tokens}
#{if usage.reasoning_tokens, do: "Reasoning: #{usage.reasoning_tokens}", else: ""}
Cost:
Input: $#{format_cost(usage.input_cost)}
Output: $#{format_cost(usage.output_cost)}
Total: $#{format_cost(usage.total_cost)}
#{format_tool_usage(usage.tool_usage)}
#{format_image_usage(usage.image_usage)}
""")
{:ok, response}
error ->
error
end
end
defp format_cost(nil), do: "n/a"
defp format_cost(cost), do: :erlang.float_to_binary(cost, decimals: 6)
defp format_tool_usage(nil), do: ""
defp format_tool_usage(tool_usage) do
Enum.map_join(tool_usage, "\n", fn {tool, %{count: count, unit: unit}} ->
"Tool Usage: #{tool} = #{count} #{unit}(s)"
end)
end
defp format_image_usage(nil), do: ""
defp format_image_usage(%{generated: %{count: count, size_class: size}}) do
"Image Usage: #{count} image(s) at #{size}"
end
defp format_image_usage(_), do: ""
end
```
## See Also
- [Data Structures](data-structures.md) - Response structure details
- [Anthropic Guide](anthropic.md) - Web search and prompt caching
- [OpenAI Guide](openai.md) - Responses API and image generation
- [xAI Guide](xai.md) - Grok web search
- [Google Guide](google.md) - Grounding and search
- [Image Generation Guide](image-generation.md) - Image costs