guides/providers/groq.md

Select File:
guides/providers/groq.md

# Groq Provider Guide

Groq provides ultra-fast LLM inference with their custom hardware, delivering exceptional performance for real-time applications.

## Configuration

Set your Groq API key:

```bash
# Add to .env file (automatically loaded)
GROQ_API_KEY=gsk_...
```

Or use in-memory storage:

```elixir
ReqLLM.put_key(:groq_api_key, "gsk_...")
```

## Supported Models

Popular Groq models include:

- `llama-3.3-70b-versatile` - Latest Llama 3.3
- `llama-3.1-8b-instant` - Fast, efficient
- `mixtral-8x7b-32768` - Large context window
- `gemma2-9b-it` - Google's Gemma 2

See the full list with `mix req_llm.model_sync groq`.

## Basic Usage

```elixir
# Simple text generation
{:ok, response} = ReqLLM.generate_text(
  "groq:llama-3.3-70b-versatile",
  "Explain async programming"
)

# Streaming (ultra-fast with Groq hardware)
{:ok, stream_response} = ReqLLM.stream_text(
  "groq:llama-3.1-8b-instant",
  "Write a story"
)

ReqLLM.StreamResponse.tokens(stream_response)
|> Stream.each(&IO.write/1)
|> Stream.run()
```

## Provider-Specific Options

### Service Tier

Control performance tier for requests:

```elixir
{:ok, response} = ReqLLM.generate_text(
  "groq:llama-3.3-70b-versatile",
  "Hello",
  provider_options: [service_tier: "performance"]
)
```

Tiers:
- `"auto"` - Automatic selection (default)
- `"on_demand"` - Standard on-demand
- `"flex"` - Flexible pricing
- `"performance"` - Highest performance

### Reasoning Effort

Control reasoning level for compatible models:

```elixir
{:ok, response} = ReqLLM.generate_text(
  "groq:deepseek-r1-distill-llama-70b",
  "Complex problem",
  provider_options: [reasoning_effort: "high"]
)
```

Levels: `"none"`, `"default"`, `"low"`, `"medium"`, `"high"`

### Reasoning Format

Specify format for reasoning output:

```elixir
{:ok, response} = ReqLLM.generate_text(
  "groq:deepseek-r1-distill-llama-70b",
  "Problem to solve",
  provider_options: [reasoning_format: "detailed"]
)
```

### Web Search

Enable web search capabilities:

```elixir
{:ok, response} = ReqLLM.generate_text(
  "groq:llama-3.3-70b-versatile",
  "Latest tech news",
  provider_options: [
    search_settings: %{
      include_domains: ["techcrunch.com", "arstechnica.com"],
      exclude_domains: ["spam.com"]
    }
  ]
)
```

### Compound Custom

Custom configuration for Compound systems:

```elixir
{:ok, response} = ReqLLM.generate_text(
  "groq:model",
  "Text",
  provider_options: [
    compound_custom: %{
      # Compound-specific settings
    }
  ]
)
```

## Complete Example

```elixir
import ReqLLM.Context

context = Context.new([
  system("You are a fast, helpful coding assistant"),
  user("Explain tail call optimization")
])

{:ok, response} = ReqLLM.generate_text(
  "groq:llama-3.3-70b-versatile",
  context,
  temperature: 0.7,
  max_tokens: 1000,
  provider_options: [
    service_tier: "performance",
    search_settings: %{
      include_domains: ["developer.mozilla.org", "stackoverflow.com"]
    }
  ]
)

text = ReqLLM.Response.text(response)
usage = response.usage

IO.puts(text)
IO.puts("Tokens: #{usage.total_tokens}, Cost: $#{usage.total_cost}")
```

## Tool Calling

Groq supports function calling on compatible models:

```elixir
weather_tool = ReqLLM.tool(
  name: "get_weather",
  description: "Get weather for a location",
  parameter_schema: [
    location: [type: :string, required: true]
  ],
  callback: {WeatherAPI, :fetch}
)

{:ok, response} = ReqLLM.generate_text(
  "groq:llama-3.3-70b-versatile",
  "What's the weather in Berlin?",
  tools: [weather_tool]
)
```

## Structured Output

Groq supports structured output generation:

```elixir
schema = [
  name: [type: :string, required: true],
  age: [type: :integer, required: true],
  skills: [type: {:list, :string}]
]

{:ok, response} = ReqLLM.generate_object(
  "groq:llama-3.3-70b-versatile",
  "Generate a software engineer profile",
  schema
)

person = ReqLLM.Response.object(response)
```

## Performance Tips

1. **Use Streaming**: Groq's hardware excels at streaming - you'll see tokens instantly
2. **Choose Right Model**: Use `8b-instant` for speed, `70b` for quality
3. **Service Tier**: Use `"performance"` tier for lowest latency
4. **Batch Requests**: Groq handles concurrent requests efficiently

## Streaming Performance

Groq's custom hardware provides exceptional streaming performance:

```elixir
{:ok, stream_response} = ReqLLM.stream_text(
  "groq:llama-3.1-8b-instant",
  "Count from 1 to 100"
)

# You'll see tokens appearing almost instantly
stream_response
|> ReqLLM.StreamResponse.tokens()
|> Stream.each(&IO.write/1)
|> Stream.run()
```

## Error Handling

```elixir
case ReqLLM.generate_text("groq:llama-3.3-70b-versatile", "Hello") do
  {:ok, response} -> 
    handle_success(response)
    
  {:error, error} -> 
    IO.puts("Error: #{error.message}")
end
```

## Key Advantages

1. **Speed**: Custom LPU hardware for ultra-fast inference
2. **Cost**: Competitive pricing for high performance
3. **Reliability**: Enterprise-grade infrastructure
4. **Compatibility**: OpenAI-compatible API

## Resources

- [Groq Documentation](https://console.groq.com/docs)
- [Model Playground](https://console.groq.com/playground)
- [Pricing](https://wow.groq.com/pricing/)