guides/meta.md

# Meta (Llama)

Generic provider for Meta Llama models using Meta's native prompt format.

## Important Usage Note

**Most deployments use OpenAI-compatible APIs** and should NOT use this provider directly:

- Azure AI Foundry → Use OpenAI-compatible API
- Google Cloud Vertex AI → Use OpenAI-compatible API
- vLLM (self-hosted) → Use OpenAI-compatible API
- Ollama (self-hosted) → Use OpenAI-compatible API
- llama.cpp (self-hosted) → Use OpenAI-compatible API

This provider is for services using **Meta's native format** with `prompt`, `max_gen_len`, `generation` fields.

## Current Use Cases

- **AWS Bedrock**: Uses native Meta format via `ReqLLM.Providers.AmazonBedrock.Meta`

For AWS Bedrock, see [Amazon Bedrock Provider Guide](amazon_bedrock.md).

## Configuration

No direct configuration - wrapped by cloud providers using native format.

## Native Format Details

### Request Format
- `prompt` - Formatted text with Llama special tokens
- `max_gen_len` - Maximum tokens to generate
- `temperature` - Sampling temperature
- `top_p` - Nucleus sampling parameter

### Response Format
- `generation` - Generated text
- `prompt_token_count` - Input token count
- `generation_token_count` - Output token count
- `stop_reason` - Why generation stopped

### Llama Prompt Format

Llama 3+ uses structured prompt format with special tokens:
- System: `<|start_header_id|>system<|end_header_id|>`
- User: `<|start_header_id|>user<|end_header_id|>`
- Assistant: `<|start_header_id|>assistant<|end_header_id|>`

## Provider Options

No custom provider options - uses standard ReqLLM options translated to native format.

## Resources

- [Meta Llama Documentation](https://llama.meta.com/)
- [Llama Model Cards](https://github.com/meta-llama/llama-models)