docs/guides/metrics.md

# Metrics

Tinkex includes a lightweight metrics system for tracking request performance, custom counters, gauges, and histograms. The `Tinkex.Metrics` server automatically collects HTTP request telemetry and provides helpers for recording custom metrics in experiments and benchmarks.

## Overview

The Metrics system is built on GenServer and Telemetry, providing:

- **Automatic HTTP request tracking**: counters for success/failure and latency histograms
- **Custom counters**: increment-based metrics for tracking events
- **Gauges**: point-in-time measurements that can be set directly
- **Histograms**: distribution tracking with percentile calculations (p50, p95, p99)
- **Zero-overhead when disabled**: metrics can be toggled off via configuration
- **Thread-safe**: all updates via GenServer casts/calls

The server starts automatically with the Tinkex application and subscribes to `[:tinkex, :http, :request, :stop]` telemetry events.

## Built-in HTTP metrics

When enabled, Tinkex automatically tracks:

### Request counters

- `:tinkex_requests_total` — total number of HTTP requests
- `:tinkex_requests_success` — requests that returned `:ok`
- `:tinkex_requests_failure` — requests that returned an error

### Request latency histogram

- `:tinkex_request_duration_ms` — end-to-end request duration in milliseconds

This histogram includes:
- **Count**: total number of requests
- **Mean**: average latency
- **Min/Max**: fastest and slowest requests
- **Percentiles**: p50 (median), p95, p99

## Custom counters

Use `Metrics.increment/2` to count events in your application:

```elixir
# Increment by 1 (default)
Tinkex.Metrics.increment(:my_custom_counter)

# Increment by a specific amount
Tinkex.Metrics.increment(:tokens_generated, 150)
Tinkex.Metrics.increment(:cache_hits, 1)
Tinkex.Metrics.increment(:errors, 1)
```

**Common use cases:**
- Track cache hits/misses
- Count successful vs failed generations
- Track tokens consumed across multiple requests
- Count specific error types

## Gauges

Gauges represent instantaneous values that can go up or down. Use `Metrics.set_gauge/2` to record the current state:

```elixir
# Track queue depth
Tinkex.Metrics.set_gauge(:queue_depth, 42)

# Track active connections
Tinkex.Metrics.set_gauge(:active_connections, 8)

# Track memory usage
{:ok, memory} = :erlang.memory(:total)
Tinkex.Metrics.set_gauge(:memory_bytes, memory)

# Track temperature parameter
Tinkex.Metrics.set_gauge(:current_temperature, 0.7)
```

**Common use cases:**
- Monitor queue depths or buffer sizes
- Track active connections or worker pools
- Record configuration values during experiments
- Monitor resource usage (memory, CPU)

Unlike counters, gauges are always set to a specific value rather than incremented.

## Histograms

Histograms track distributions of values over time. Use `Metrics.record_histogram/2` to record samples (values should be in milliseconds):

```elixir
# Record a custom latency measurement
start = System.monotonic_time(:millisecond)
result = do_some_work()
duration_ms = System.monotonic_time(:millisecond) - start
Tinkex.Metrics.record_histogram(:custom_operation_duration, duration_ms)

# Track token generation time
Tinkex.Metrics.record_histogram(:token_generation_ms, 125.5)

# Track decode latency
Tinkex.Metrics.record_histogram(:decode_latency_ms, 3.2)
```

**Histogram features:**
- Automatic bucket assignment based on configured latency buckets
- Stores up to `max_samples` individual values for percentile calculation
- Computes min, max, mean, p50, p95, p99
- Memory-bounded (older samples dropped when limit reached)

**Common use cases:**
- Track end-to-end operation latencies
- Measure token generation speed
- Monitor decode/encode times
- Track database query performance

## Getting snapshots

Call `Metrics.snapshot/0` to retrieve current metrics state:

```elixir
snapshot = Tinkex.Metrics.snapshot()

# Snapshot structure:
%{
  counters: %{
    tinkex_requests_total: 150,
    tinkex_requests_success: 145,
    tinkex_requests_failure: 5,
    my_custom_counter: 42
  },
  gauges: %{
    queue_depth: 8,
    active_connections: 4
  },
  histograms: %{
    tinkex_request_duration_ms: %{
      count: 150,
      mean: 245.3,
      min: 89.2,
      max: 1205.7,
      p50: 220.1,
      p95: 458.2,
      p99: 892.5
    }
  }
}
```

**Access specific metrics:**

```elixir
snapshot = Tinkex.Metrics.snapshot()

# Check total requests
total = snapshot.counters[:tinkex_requests_total] || 0

# Check success rate
success = snapshot.counters[:tinkex_requests_success] || 0
failure = snapshot.counters[:tinkex_requests_failure] || 0
success_rate = if total > 0, do: success / total * 100, else: 0

# Check p99 latency
latency_hist = snapshot.histograms[:tinkex_request_duration_ms]
p99_latency = latency_hist.p99
```

## Understanding latency percentiles

Percentiles tell you what percentage of requests completed faster than a given threshold:

- **p50 (median)**: 50% of requests were faster than this value
- **p95**: 95% of requests were faster than this value
- **p99**: 99% of requests were faster than this value

**Example interpretation:**

```elixir
%{
  p50: 220.1,   # Half of all requests completed in under 220ms
  p95: 458.2,   # 95% completed in under 458ms
  p99: 892.5    # 99% completed in under 892ms
}
```

High p99 values indicate "tail latency" — a small percentage of requests taking much longer than average. This is critical for understanding worst-case user experience.

## Configuration options

Configure metrics in `config/config.exs`:

```elixir
config :tinkex,
  # Enable or disable metrics collection
  metrics_enabled: true,

  # Histogram bucket boundaries in milliseconds
  # Default: [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]
  metrics_latency_buckets: [10, 50, 100, 250, 500, 1_000, 2_500, 5_000],

  # Maximum individual samples to keep per histogram
  # Default: 1_000
  metrics_histogram_max_samples: 2_000
```

**Configuration guide:**

### Latency buckets

Buckets define histogram boundaries. Choose values appropriate for your workload:

```elixir
# For fast operations (sub-second)
metrics_latency_buckets: [1, 5, 10, 25, 50, 100, 250, 500]

# For slow operations (multi-second)
metrics_latency_buckets: [100, 500, 1_000, 2_000, 5_000, 10_000, 30_000]

# For mixed workloads (default)
metrics_latency_buckets: [1, 2, 5, 10, 20, 50, 100, 200, 500, 1_000, 2_000, 5_000]
```

More buckets = finer granularity but more memory usage.

### Max samples

The `max_samples` setting controls how many individual values are stored for percentile calculation:

```elixir
# Lower memory usage, less accurate percentiles
metrics_histogram_max_samples: 500

# Higher accuracy, more memory
metrics_histogram_max_samples: 5_000
```

When the limit is reached, new samples displace older ones. For production workloads with high volume, consider a lower value (500-1000). For detailed analysis, use higher values (5000-10000).

### Disabling metrics

To disable metrics entirely:

```elixir
config :tinkex, metrics_enabled: false
```

Or pass at startup:

```elixir
{:ok, _} = Tinkex.Metrics.start_link(enabled: false)
```

## Integration with experiments

Use metrics to track experiment progress and performance:

```elixir
defmodule MyExperiment do
  def run_benchmark(num_iterations) do
    # Reset metrics at start
    :ok = Tinkex.Metrics.reset()

    # Track experiment configuration
    Tinkex.Metrics.set_gauge(:experiment_iterations, num_iterations)
    Tinkex.Metrics.set_gauge(:experiment_temperature, 0.7)

    Enum.each(1..num_iterations, fn i ->
      start = System.monotonic_time(:millisecond)

      # Your experiment code
      {:ok, result} = run_single_trial(i)

      # Track custom metrics
      Tinkex.Metrics.increment(:trials_completed)
      if result.success?, do: Tinkex.Metrics.increment(:successful_trials)

      # Track trial duration
      duration = System.monotonic_time(:millisecond) - start
      Tinkex.Metrics.record_histogram(:trial_duration_ms, duration)

      # Track tokens generated
      Tinkex.Metrics.increment(:total_tokens, result.num_tokens)
    end)

    # Flush pending updates
    :ok = Tinkex.Metrics.flush()

    # Get final snapshot
    snapshot = Tinkex.Metrics.snapshot()

    # Compute experiment metrics
    total_trials = snapshot.counters[:trials_completed] || 0
    successful = snapshot.counters[:successful_trials] || 0
    success_rate = if total_trials > 0, do: successful / total_trials * 100, else: 0

    trial_stats = snapshot.histograms[:trial_duration_ms]

    IO.puts """
    Experiment complete:
      Trials: #{total_trials}
      Success rate: #{:erlang.float_to_binary(success_rate, decimals: 1)}%
      Trial duration:
        Mean: #{format_ms(trial_stats.mean)}
        p50:  #{format_ms(trial_stats.p50)}
        p95:  #{format_ms(trial_stats.p95)}
        p99:  #{format_ms(trial_stats.p99)}
      HTTP requests:
        Total: #{snapshot.counters[:tinkex_requests_total] || 0}
        Success: #{snapshot.counters[:tinkex_requests_success] || 0}
        Failure: #{snapshot.counters[:tinkex_requests_failure] || 0}
    """
  end

  defp format_ms(nil), do: "n/a"
  defp format_ms(value), do: "#{:erlang.float_to_binary(value, decimals: 2)}ms"
end
```

## Integration with benchmarks

Track comparative performance across different configurations:

```elixir
defmodule ModelComparison do
  def compare_models(models, prompt, num_runs) do
    results =
      Enum.map(models, fn model ->
        # Reset for each model
        :ok = Tinkex.Metrics.reset()

        Enum.each(1..num_runs, fn _ ->
          {:ok, _response} = sample_with_model(model, prompt)
        end)

        :ok = Tinkex.Metrics.flush()
        snapshot = Tinkex.Metrics.snapshot()

        latency = snapshot.histograms[:tinkex_request_duration_ms]

        {model, %{
          total_requests: snapshot.counters[:tinkex_requests_total] || 0,
          success_rate: calculate_success_rate(snapshot),
          mean_latency: latency.mean,
          p50_latency: latency.p50,
          p99_latency: latency.p99
        }}
      end)

    # Print comparison table
    print_comparison_table(results)
  end

  defp calculate_success_rate(snapshot) do
    total = snapshot.counters[:tinkex_requests_total] || 0
    success = snapshot.counters[:tinkex_requests_success] || 0
    if total > 0, do: success / total * 100, else: 0
  end
end
```

## Utility functions

### Reset metrics

Clear all counters, gauges, and histograms:

```elixir
:ok = Tinkex.Metrics.reset()
```

Use this between experiments or benchmark runs to start fresh.

### Flush pending updates

Block until all pending metric updates are processed:

```elixir
:ok = Tinkex.Metrics.flush()
```

This ensures all async casts have been handled before reading a snapshot. Useful for deterministic testing and experiment finalization.

## Example: end-to-end workflow

See `examples/metrics_live.exs` for a complete example:

```elixir
# Reset metrics
:ok = Tinkex.Metrics.reset()

# Run some requests (metrics collected automatically)
{:ok, service} = Tinkex.ServiceClient.start_link(config: config)
{:ok, sampler} = Tinkex.ServiceClient.create_sampling_client(service, base_model: model)
{:ok, task} = Tinkex.SamplingClient.sample(sampler, prompt, params, num_samples: 5)
{:ok, _response} = Task.await(task, 30_000)

# Ensure all metrics are recorded
:ok = Tinkex.Metrics.flush()

# Get snapshot
snapshot = Tinkex.Metrics.snapshot()

# Print results
IO.puts "\n=== Metrics Snapshot ==="
IO.puts "Counters:"
Enum.each(snapshot.counters, fn {name, value} ->
  IO.puts "  #{name}: #{value}"
end)

IO.puts "\nLatency (ms):"
latency = snapshot.histograms[:tinkex_request_duration_ms]
IO.puts "  count: #{latency.count}"
IO.puts "  mean:  #{:erlang.float_to_binary(latency.mean, decimals: 2)}"
IO.puts "  p50:   #{:erlang.float_to_binary(latency.p50, decimals: 2)}"
IO.puts "  p95:   #{:erlang.float_to_binary(latency.p95, decimals: 2)}"
IO.puts "  p99:   #{:erlang.float_to_binary(latency.p99, decimals: 2)}"
```

Run the example:

```bash
TINKER_API_KEY=your-key mix run examples/metrics_live.exs
```

## Best practices

1. **Reset between experiments**: Call `Metrics.reset/0` at the start of each independent run
2. **Flush before reading**: Call `Metrics.flush/0` before taking snapshots to ensure all updates are processed
3. **Choose appropriate buckets**: Match latency buckets to your expected request durations
4. **Monitor p99**: Don't just look at averages — p99 reveals tail latency issues
5. **Track custom metrics**: Use counters and histograms to track domain-specific events
6. **Use gauges for configuration**: Record experiment parameters as gauges for reproducibility
7. **Disable in production**: If metrics aren't needed, disable to reduce overhead

## What to read next

- Getting started with Tinkex: `docs/guides/getting_started.md`
- Troubleshooting common issues: `docs/guides/troubleshooting.md`
- Training loop integration: `docs/guides/training_loop.md`
- API reference: `docs/guides/api_reference.md`