guides/coverage-testing.md

Select File:
guides/coverage-testing.md

# Coverage Testing Guide

> **Note:** This guide is largely superseded by the [Fixture Testing Guide](fixture-testing.md), which covers the modern `mix req_llm.model_compat` task and comprehensive testing system. This document remains for reference on legacy testing patterns.

This guide covers testing and verification workflows for ReqLLM, focusing on live API coverage tests with fixture support for local testing without API calls.

## Overview

ReqLLM's testing system is built around two core principles:

1. **Provider coverage testing** - Tests verify that provider implementations work correctly across different features
2. **Fixture-based testing** - Tests can run against live APIs or cached fixtures for fast local development

## Testing Modes

### Fixture Mode (Default)

By default, tests use cached fixtures for fast, reliable testing:

```bash
mix test                    # Uses fixtures
mix test --only openai      # Test specific provider with fixtures
```

### Live Mode

Set `REQ_LLM_FIXTURES_MODE=record` to test against real APIs and capture new fixtures:

```bash
REQ_LLM_FIXTURES_MODE=record mix test                    # Run all tests live
REQ_LLM_FIXTURES_MODE=record mix test --only openai      # Test specific provider live
REQ_LLM_FIXTURES_MODE=record mix test --only coverage    # Run coverage tests live
```

**Live mode will:**
- Make real API calls to providers
- Capture responses as JSON fixtures
- Overwrite existing fixtures with new responses
- Require valid API keys for each provider

## Quality & CI

CI runs `mix quality` alias before tests. Locally:

```bash
mix quality    # or mix q - runs format, compile --warnings-as-errors, dialyzer, credo
```

## Test Organization

### Directory Structure

```
test/
├── coverage/                 # Provider capability coverage tests
│   ├── anthropic/
│   │   ├── comprehensive_test.exs   # All capabilities
│   │   └── fixtures/                # Cached API responses
│   └── openai/
│       ├── comprehensive_test.exs
│       └── fixtures/
├── support/
│   ├── live_fixture.ex       # Test fixture system
│   └── provider_test/        # Shared test macros
├── req_llm/
└── req_llm_test.exs         # Core library tests
```

### Test Tags

Tests use ExUnit tags for organization:

```elixir
@moduletag :coverage           # Coverage test
@moduletag provider: "anthropic"  # Provider-specific (string)
@tag scenario: :basic          # Scenario-specific (atom)
@tag scenario: :streaming      # Feature-specific
@tag scenario: :tool_multi     # Capability-specific
```

Run specific test groups:
```bash
mix test --only coverage
mix test --only openai
mix test --only streaming
```

## Writing Capability Tests

### Using Provider Test Macros

ReqLLM uses shared test macros to eliminate duplication while maintaining clear per-provider organization:

```elixir
defmodule ReqLLM.Coverage.MyProvider.CoreTest do
  use ReqLLM.ProviderTest.Core,
    provider: :my_provider,
    model: "my_provider:my-model"

  # Provider-specific tests can be added here
end
```

Available macros:
- `ReqLLM.ProviderTest.Comprehensive` - All capabilities (basic, streaming, tools, objects, reasoning)
- `ReqLLM.ProviderTest.Embedding` - Embedding generation

### Capability-Driven Tests

Verify capabilities match metadata before testing:

```elixir
test "temperature parameter works as advertised" do
  # Check if model advertises temperature support
  supports_temp = ReqLLM.Capability.supports?(@model, :temperature)
  
  if supports_temp do
    result = use_fixture(:my_provider, "temperature_test", fn ->
      ctx = ReqLLM.Context.new([ReqLLM.Context.user("Be creative")])
      ReqLLM.generate_text(@model, ctx, temperature: 1.0, max_tokens: 50)
    end)
    
    {:ok, resp} = result
    assert resp.id != nil
  else
    skip("Model does not advertise temperature support")
  end
end
```

### Testing Tool Calling

Comprehensive tool calling tests:

```elixir
describe "tool calling capabilities" do
  @weather_tool %{
    name: "get_weather",
    description: "Get weather for a location",
    parameter_schema: %{
      type: "object",
      properties: %{
        location: %{type: "string", description: "City name"}
      },
      required: ["location"]
    }
  }

  test "basic tool calling", fixture: "tool_calling_basic" do
    ctx = ReqLLM.Context.new([
      ReqLLM.Context.user("What's the weather in Paris?")
    ])
    
    {:ok, resp} = ReqLLM.generate_text(@model, ctx, 
      tools: [@weather_tool],
      max_tokens: 200
    )
    
    assert resp.id != nil
  end
  
  test "tool choice control" do
    if ReqLLM.Capability.supports?(@model, :tool_choice) do
      result = use_fixture(:my_provider, "tool_choice_specific", fn ->
        ctx = ReqLLM.Context.new([
          ReqLLM.Context.user("Tell me about weather")
        ])
        
        ReqLLM.generate_text(@model, ctx, 
          tools: [@weather_tool],
          tool_choice: %{type: "tool", name: "get_weather"}
        )
      end)
      
      {:ok, resp} = result
      assert resp.id != nil
    else
      skip("Model does not support tool choice control")
    end
  end

  test "tool result handling" do
    result = use_fixture(:my_provider, "tool_with_result", fn ->
      ctx = ReqLLM.Context.new([
        ReqLLM.Context.user("What's the weather like?"),
        ReqLLM.Context.assistant("", tool_calls: [
          %{id: "call_1", name: "get_weather", arguments: %{"location" => "Paris"}}
        ]),
        ReqLLM.Context.tool_result("call_1", %{"weather" => "sunny", "temp" => 22})
      ])
      
      ReqLLM.generate_text(@model, ctx, tools: [@weather_tool])
    end)
    
    {:ok, resp} = result
    assert resp.id != nil
  end
end
```

### Testing Streaming

Test streaming with proper chunk handling:

```elixir
test "streaming text generation", fixture: "streaming_test" do
  if ReqLLM.Capability.supports?(@model, :streaming) do
    ctx = ReqLLM.Context.new([ReqLLM.Context.user("Tell me a story")])
    
    {:ok, resp} = ReqLLM.stream_text(@model, ctx, max_tokens: 100)
    
    assert resp.id != nil
    text = ReqLLM.Response.text(resp)
    assert is_binary(text)
  else
    skip("Model does not support streaming")
  end
end
```

### Testing Multimodal Capabilities

Test image and other modality support:

```elixir
test "image input processing" do
  modalities = ReqLLM.Capability.modalities(@model)
  input_modalities = get_in(modalities, [:input]) || []
  
  if "image" in input_modalities do
    result = use_fixture(:my_provider, "image_input", fn ->
      # Base64 encoded test image
      image_data = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNkYPhfDwAChwGA60e6kgAAAABJRU5ErkJggg=="
      
      ctx = ReqLLM.Context.new([
        ReqLLM.Context.user([
          %{type: "text", text: "What do you see in this image?"},
          %{type: "image", source: %{
            type: "base64",
            media_type: "image/png", 
            data: image_data
          }}
        ])
      ])
      
      ReqLLM.generate_text(@model, ctx, max_tokens: 100)
    end)
    
    {:ok, resp} = result
    assert resp.id != nil
  else
    skip("Model does not support image input")
  end
end
```

## Fixture Management

### Fixture Format

Fixtures are stored as JSON with metadata:

```json
{
  "captured_at": "2025-01-15T10:30:00Z",
  "model_spec": "openai:gpt-4o",
  "scenario": "basic",
  "result": {
    "ok": true,
    "response": {
      "id": "resp_123",
      "model": "gpt-4o",
      "message": {
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello there!"}]
      },
      "usage": {"input_tokens": 5, "output_tokens": 3}
    }
  }
}
```

### Fixture Organization

Organize fixtures by provider and test name:

```
test/support/fixtures/
├── anthropic/
│   ├── basic_completion.json
│   ├── system_prompt_completion.json
│   ├── temperature_test.json
│   ├── streaming_test.json
│   ├── tool_calling_basic.json
│   ├── tool_choice_specific.json
│   └── tool_with_result.json
└── openai/
    ├── basic_completion.json
    └── tool_calling_basic.json
```

### LiveFixture API Changes (1.0.0-rc.1)

The LiveFixture API now requires the provider as the first argument:

```elixir
# Current API (1.0.0-rc.1)
use_fixture(:provider_atom, "fixture_name", fn ->
  # test code
end)

# Old API (deprecated)
use_fixture("fixture_name", [], fn ->
  # test code  
end)
```

### Fixture Best Practices

1. **Descriptive naming** - Use clear fixture names that indicate what they test
2. **Minimal responses** - Use `max_tokens` to keep fixtures small
3. **Deterministic content** - Use low temperature for reproducible responses
4. **Regular updates** - Refresh fixtures when APIs change

```elixir
# Good fixture usage
use_fixture(:openai, "low_temperature", fn ->
  ReqLLM.generate_text(@model, ctx, 
    temperature: 0.1,  # Deterministic
    max_tokens: 20     # Minimal
  )
end)
```

## Provider Verification Workflows

### Adding a New Provider

1. **Create provider module** with DSL
2. **Add metadata file** in `priv/models_dev/`
3. **Create coverage tests** using provider macros
4. **Run live tests** to capture fixtures
5. **Validate capabilities** match implementation

```bash
# Create provider tests using macros
# test/coverage/my_provider/core_test.exs
# test/coverage/my_provider/streaming_test.exs
# test/coverage/my_provider/tool_calling_test.exs

# Run live tests to capture fixtures
REQ_LLM_FIXTURES_MODE=record mix test --only coverage --only my_provider

# Quality check
mix quality
```

### Ongoing Verification

Regular verification workflows:

```bash
# Daily: Validate all providers with fixtures
mix test --only coverage

# Weekly: Refresh critical fixtures
REQ_LLM_FIXTURES_MODE=record mix test test/coverage/*/comprehensive_test.exs

# Release: Full live test suite
REQ_LLM_FIXTURES_MODE=record mix test --only coverage

# API Changes: Update specific provider
REQ_LLM_FIXTURES_MODE=record mix test --only "provider:anthropic" --only coverage
```

## Best Practices

### Test Organization

1. **Use provider macros** - Leverage shared test patterns for consistency
2. **Group by capability** - Organize tests around features, not just providers
3. **Use descriptive names** - Test names should explain what capability is tested
4. **Tag appropriately** - Use ExUnit tags for selective test execution

### Fixture Management

1. **Keep fixtures small** - Use minimal token limits to reduce file size
2. **Use deterministic settings** - Low temperature for consistent responses  
3. **Version control fixtures** - Commit fixtures to track API changes over time
4. **Update regularly** - Refresh fixtures when provider APIs change

### Error Handling

Test error conditions with proper fixture handling:

```elixir
test "handles invalid model gracefully" do
  result = use_fixture(:anthropic, "invalid_model_error", fn ->
    ReqLLM.generate_text("anthropic:invalid-model", "Hello")
  end)
  
  {:error, error} = result
  assert %ReqLLM.Error.API{} = error
end
```

### Environment Management

Handle API keys and environment variables properly:

```elixir
# Skip tests if API key not available  
# Keys are automatically loaded from .env via dotenvy at startup
setup do
  case ReqLLM.Keys.get(:anthropic_api_key) do
    {:ok, _key} -> :ok
    {:error, _reason} -> skip("ANTHROPIC_API_KEY not configured in .env")
  end
end
```

This coverage testing approach ensures that ReqLLM providers work correctly across all supported features and helps maintain compatibility as APIs evolve.