guides/usage-and-billing.md

# Usage & Billing

## Overview

ReqLLM provides comprehensive usage tracking and cost calculation for all API requests. Every response includes normalized usage data that works consistently across providers, with detailed breakdowns for tokens, tools, and images.

## The Usage Structure

Every `ReqLLM.Response` includes a `usage` map with normalized metrics:

```elixir
{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello")

response.usage
#=> %{
#     # Token counts
#     input_tokens: 8,
#     output_tokens: 12,
#     total_tokens: 20,
#
#     # Cost summary (USD)
#     input_cost: 0.00024,
#     output_cost: 0.00036,
#     total_cost: 0.0006,
#
#     # Detailed cost breakdown
#     cost: %{
#       tokens: 0.0006,
#       tools: 0.0,
#       images: 0.0,
#       total: 0.0006
#     }
#   }
```

## Token Usage

### Standard Tokens

All providers report basic token counts:

| Field | Description |
|-------|-------------|
| `input_tokens` | Tokens in the request (prompt, context, tools) |
| `output_tokens` | Tokens generated by the model |
| `total_tokens` | Sum of input and output tokens |

### Reasoning Tokens

For reasoning models (OpenAI o1/o3/gpt-5, Anthropic extended thinking, Google thinking):

```elixir
{:ok, response} = ReqLLM.generate_text("openai:o3-mini", prompt)

response.usage.reasoning_tokens
#=> 1250  # Tokens used for internal reasoning
```

The `reasoning_tokens` field tracks tokens used for chain-of-thought reasoning. These may be billed differently than standard tokens depending on the provider.

### Cached Tokens

For providers that support prompt caching (Anthropic, OpenAI):

```elixir
response.usage.cached_tokens
#=> 500  # Input tokens served from cache

response.usage.cache_creation_tokens
#=> 0    # Tokens used to create new cache entries
```

Cached tokens are typically billed at a reduced rate. See [Anthropic Prompt Caching](anthropic.md#anthropic_prompt_cache) for details.

## Tool Usage

When using tools like web search, usage is tracked in `tool_usage`:

```elixir
response.usage.tool_usage
#=> %{
#     web_search: %{count: 2, unit: "call"}
#   }
```

### Web Search

Each provider has slightly different web search tracking:

| Provider | Unit | Notes |
|----------|------|-------|
| Anthropic | `"call"` | $10 per 1,000 searches |
| OpenAI | `"call"` | Responses API models only |
| xAI | `"call"` or `"source"` | Varies by response format |
| Google | `"query"` | Grounding queries |

**Anthropic Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
  "anthropic:claude-sonnet-4-5",
  "What's happening in AI today?",
  provider_options: [web_search: %{max_uses: 5}]
)

response.usage.tool_usage.web_search
#=> %{count: 3, unit: "call"}
```

**xAI Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
  "xai:grok-4-1-fast-reasoning",
  "Latest tech news",
  xai_tools: [%{type: "web_search"}]
)

response.usage.tool_usage.web_search
#=> %{count: 5, unit: "call"}
```

**Google Grounding Example:**
```elixir
{:ok, response} = ReqLLM.generate_text(
  "google:gemini-3-flash-preview",
  "Current stock market trends",
  provider_options: [google_grounding: %{enable: true}]
)

response.usage.tool_usage.web_search
#=> %{count: 2, unit: "query"}
```

## Image Usage

For image generation, usage is tracked in `image_usage`:

```elixir
{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt)

response.usage.image_usage
#=> %{
#     generated: %{count: 1, size_class: "1024x1024"}
#   }
```

### Size Classes

Image costs vary by resolution:

| Provider | Size Classes |
|----------|-------------|
| OpenAI GPT Image | `"1024x1024"`, `"1536x1024"`, `"1024x1536"`, `"auto"` |
| OpenAI DALL-E 3 | `"1024x1024"`, `"1792x1024"`, `"1024x1792"` |
| Google | Based on aspect ratio |

### Multiple Images

```elixir
{:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3)

response.usage.image_usage.generated
#=> %{count: 3, size_class: "1024x1024"}
```

## Cost Breakdown

The `cost` map provides a detailed breakdown by category:

```elixir
response.usage.cost
#=> %{
#     tokens: 0.001,    # Token-based costs (input + output)
#     tools: 0.02,      # Web search and tool costs
#     images: 0.04,     # Image generation costs
#     total: 0.061,     # Sum of all costs
#     line_items: [...]  # Per-component details
#   }
```

### Line Items

For detailed billing analysis, `line_items` provides per-component costs:

```elixir
response.usage.cost.line_items
#=> [
#     %{component: "token.input", cost: 0.0003, quantity: 100},
#     %{component: "token.output", cost: 0.0007, quantity: 50},
#     %{component: "tool.web_search", cost: 0.02, quantity: 2}
#   ]
```

## Provider-Specific Notes

### Anthropic

- **Web search**: $10 per 1,000 searches
- **Prompt caching**: Reduced rates for cached tokens
- **Extended thinking**: Reasoning tokens tracked separately

### OpenAI

- **Responses API**: Web search available for o1, o3, gpt-5 models
- **Chat Completions API**: No built-in web search
- **Image generation**: Costs vary by model and size

### xAI

- **Web search**: Via `xai_tools` option
- **Deprecated**: `live_search` is no longer supported
- **Units**: May report as `"call"` or `"source"`

### Google

- **Grounding**: Search via `google_grounding` option
- **Units**: Reports as `"query"`
- **Image generation**: Gemini image models supported

## Telemetry

A telemetry event is published on every request:

```elixir
:telemetry.attach(
  "my-usage-handler",
  [:req_llm, :token_usage],
  fn _event, measurements, metadata, _config ->
    IO.inspect(measurements, label: "Usage")
    IO.inspect(metadata, label: "Metadata")
  end,
  nil
)
```

Event measurements include:
- `input_tokens`, `output_tokens`, `total_tokens`
- `input_cost`, `output_cost`, `total_cost`
- `reasoning_tokens` (when applicable)

## Example: Complete Usage Tracking

```elixir
defmodule UsageTracker do
  def track_request(model, prompt, opts \\ []) do
    {duration_us, result} = :timer.tc(fn ->
      ReqLLM.generate_text(model, prompt, opts)
    end)

    case result do
      {:ok, response} ->
        usage = response.usage

        IO.puts("""
        Request completed in #{duration_us / 1000}ms

        Tokens:
          Input: #{usage.input_tokens}
          Output: #{usage.output_tokens}
          Total: #{usage.total_tokens}
          #{if usage.reasoning_tokens, do: "Reasoning: #{usage.reasoning_tokens}", else: ""}

        Cost:
          Input: $#{format_cost(usage.input_cost)}
          Output: $#{format_cost(usage.output_cost)}
          Total: $#{format_cost(usage.total_cost)}

        #{format_tool_usage(usage.tool_usage)}
        #{format_image_usage(usage.image_usage)}
        """)

        {:ok, response}

      error ->
        error
    end
  end

  defp format_cost(nil), do: "n/a"
  defp format_cost(cost), do: :erlang.float_to_binary(cost, decimals: 6)

  defp format_tool_usage(nil), do: ""
  defp format_tool_usage(tool_usage) do
    Enum.map_join(tool_usage, "\n", fn {tool, %{count: count, unit: unit}} ->
      "Tool Usage: #{tool} = #{count} #{unit}(s)"
    end)
  end

  defp format_image_usage(nil), do: ""
  defp format_image_usage(%{generated: %{count: count, size_class: size}}) do
    "Image Usage: #{count} image(s) at #{size}"
  end
  defp format_image_usage(_), do: ""
end
```

## See Also

- [Data Structures](data-structures.md) - Response structure details
- [Anthropic Guide](anthropic.md) - Web search and prompt caching
- [OpenAI Guide](openai.md) - Responses API and image generation
- [xAI Guide](xai.md) - Grok web search
- [Google Guide](google.md) - Grounding and search
- [Image Generation Guide](image-generation.md) - Image costs