# Image Generation
> **Interactive Demo:** Try the [Image Generation Livebook](image-generation.livemd) to compare image generation across OpenAI, xAI, and Google in parallel.
## Overview
ReqLLM provides image generation through the `ReqLLM.generate_image/3` function, which works similarly to `ReqLLM.generate_text/3`. The key difference is that the response contains image data instead of text.
### Basic Usage
```elixir
{:ok, response} = ReqLLM.generate_image(
"openai:gpt-image-1",
"A serene Japanese garden with cherry blossoms"
)
# Extract the image binary data
image_data = ReqLLM.Response.image_data(response)
# Save to file
File.write!("garden.png", image_data)
```
### Response Structure
Image generation returns a canonical `ReqLLM.Response` struct where the assistant message contains `ReqLLM.Message.ContentPart` entries of type `:image` (binary data) or `:image_url` (URL reference).
```elixir
# Get the first image part
image_part = ReqLLM.Response.image(response)
# => #ContentPart<:image image/png (3469636 bytes)>
# Get all images (when n > 1)
all_images = ReqLLM.Response.images(response)
# Convenience helpers
binary_data = ReqLLM.Response.image_data(response) # First :image part's data
url = ReqLLM.Response.image_url(response) # First :image_url part's URL
```
## Common Options
These options are supported across providers (where the model allows):
| Option | Type | Description |
|--------|------|-------------|
| `n` | integer | Number of images to generate (provider-dependent; gemini-2.5-flash-image and gemini-3-pro-image-preview reject `n`) |
| `size` | string or tuple | Image dimensions, e.g., `"1024x1024"` or `{1024, 1024}` |
| `aspect_ratio` | string | Aspect ratio, e.g., `"16:9"` or `"1:1"` |
| `output_format` | atom | Image format: `:png`, `:jpeg`, or `:webp` |
| `response_format` | atom | Return type: `:binary` (default) or `:url` |
| `quality` | atom/string | Image quality (provider-dependent) |
| `seed` | integer | Random seed for reproducibility (provider-dependent) |
| `negative_prompt` | string | What to avoid in the image (provider-dependent) |
## Discovering Available Models
```elixir
# List all models that support image generation
ReqLLM.Images.supported_models()
# => ["openai:gpt-image-1", "openai:dall-e-3", "google:gemini-2.5-flash-image", ...]
# Validate a specific model
{:ok, model} = ReqLLM.Images.validate_model("openai:gpt-image-1")
```
---
## OpenAI
OpenAI offers several image generation models through the Images API.
### Supported Models
The GPT Image family provides superior instruction following, text rendering, detailed editing, and real-world knowledge. We recommend `gpt-image-1.5` for the best quality, or `gpt-image-1-mini` for cost-effective generation when image quality isn't the priority.
| Model | Notes |
|-------|-------|
| `gpt-image-1.5` | State-of-the-art, best overall quality |
| `gpt-image-1` | High fidelity with transparency support |
| `gpt-image-1-mini` | Cost-effective option for simpler use cases |
| `dall-e-3` | Higher quality than DALL-E 2, larger resolutions (deprecated May 2026) |
| `dall-e-2` | Lower cost, supports inpainting/variations (deprecated May 2026) |
### Current Limitations
ReqLLM currently supports **image generation only** via the Images API. The following OpenAI features are not yet supported:
- **Image editing** (editing with masks via the Images API)
- **Image variations** (DALL-E 2 only)
- **Responses API image generation tool** (generates images inline during chat)
### Prompt Format
OpenAI's image generation accepts only a **single text prompt** - it does not support multi-turn conversations or image editing via context. Be descriptive in your prompt to get the best results.
```elixir
# Good: Descriptive prompt
{:ok, response} = ReqLLM.generate_image(
"openai:gpt-image-1",
"A cozy coffee shop interior with warm lighting, exposed brick walls,
vintage furniture, and steam rising from ceramic cups on wooden tables"
)
```
### Size Options
**GPT Image models** (gpt-image-1.5, gpt-image-1, gpt-image-1-mini):
- `"1024x1024"` (square, fastest)
- `"1536x1024"` (landscape)
- `"1024x1536"` (portrait)
- `"auto"` (default)
**dall-e-3:**
- `"1024x1024"`
- `"1792x1024"` (landscape)
- `"1024x1792"` (portrait)
**dall-e-2:**
- `"256x256"`, `"512x512"`, `"1024x1024"`
### OpenAI-Specific Options
```elixir
# gpt-image-1 with transparency
{:ok, response} = ReqLLM.generate_image(
"openai:gpt-image-1",
"A golden retriever puppy, isolated on transparent background",
output_format: :png,
provider_options: [background: "transparent"]
)
# dall-e-3 with style
{:ok, response} = ReqLLM.generate_image(
"openai:dall-e-3",
"A mountain landscape at sunset",
size: "1792x1024",
quality: :hd,
style: :vivid # or :natural for more realistic
)
```
**GPT Image specific options** (via `provider_options`):
| Option | Values | Description |
|--------|--------|-------------|
| `background` | `"transparent"`, `"opaque"`, `"auto"` | Background transparency (use PNG/WebP format) |
| `moderation` | `"auto"`, `"low"` | Content moderation strictness |
**dall-e-3 specific options:**
| Option | Values | Description |
|--------|--------|-------------|
| `quality` | `:standard`, `:hd` | Image detail level |
| `style` | `:vivid`, `:natural` | Artistic vs realistic style |
### Revised Prompts
DALL-E 3 may automatically enhance your prompt for better results. The revised prompt is available in the response metadata:
```elixir
{:ok, response} = ReqLLM.generate_image("openai:dall-e-3", "A cat")
[image_part] = ReqLLM.Response.images(response)
revised = image_part.metadata[:revised_prompt]
# => "A fluffy orange tabby cat sitting gracefully on a windowsill..."
```
---
## Google (Gemini)
Google's Gemini models support both text-to-image generation and image editing through multi-turn conversations.
### Supported Models
| Model | Alias | Notes |
|-------|-------|-------|
| `gemini-2.5-flash-image` | Nano Banana | Fast generation, good for quick iterations and standard tasks |
| `gemini-3-pro-image-preview` | Nano Banana Pro | State-of-the-art quality, advanced text rendering, professional assets |
| `imagen-4.0-generate-001` | Imagen 4 | High-quality photorealistic images |
| `imagen-4.0-fast-generate-001` | Imagen 4 Fast | Faster generation with good quality |
### Model Selection
**Choose Gemini 2.5 Flash** for:
- Quick prototyping and iteration
- Straightforward text-to-image tasks
- Speed-sensitive applications
**Choose Gemini 3 Pro Preview** for:
- Professional-grade asset production
- Complex multi-turn editing workflows
- Text-heavy designs (logos, menus, infographics, diagrams)
- Character consistency across multiple images
- High-resolution output (1K, 2K, 4K)
- Tasks requiring advanced reasoning
**Choose Imagen** for:
- High-quality photorealistic images
- When you don't need multi-turn editing capabilities
### Basic Generation
Note: `gemini-2.5-flash-image` and `gemini-3-pro-image-preview` reject `n`; specify the image count in the prompt.
```elixir
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
"A futuristic cityscape with flying cars and neon lights",
aspect_ratio: "16:9"
)
```
### Generating Multiple Images
**Important:** Google's documentation states that "the model won't always follow the exact number of image outputs that the user explicitly asks for." Multi-image generation is inherently unreliable, and prompt phrasing significantly affects success rates.
**Effective prompt patterns** (higher success rate):
```elixir
# Numbered list format - works well
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
"Generate multiple images: 1) A white cat 2) A black cat"
)
# Sequential instructions - works well
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
"Generate the first image of a sunrise, then generate a second image of a sunset"
)
# Labeled scenes - works well
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
"Generate multiple scenes: Scene A shows a forest, Scene B shows a desert"
)
images = ReqLLM.Response.images(response)
# May return 1 or 2 images depending on model behavior
```
**Less effective prompt patterns** (often returns only 1 image):
```elixir
# Simple count requests - often fails
"Generate two images of cats"
"Create 2 pictures of a banana"
# Even with emphasis - often fails
"Create two DISTINCT and SEPARATE images"
```
The model may respond with text like "here are two images" but only deliver one. For reliable multi-image workflows, consider making multiple API calls or using the numbered list format above.
### Aspect Ratios
Google supports flexible aspect ratios:
- `"1:1"` (square)
- `"3:4"`, `"4:3"`
- `"4:5"`, `"5:4"`
- `"9:16"`, `"16:9"`
- `"2:3"`, `"3:2"`
- `"21:9"` (ultrawide)
### Image Editing with Context
Unlike OpenAI, Google Gemini supports **image editing** by including an existing image in the conversation context. This enables powerful workflows like style transfer, object addition/removal, and iterative refinement.
```elixir
alias ReqLLM.{Context, Message}
alias ReqLLM.Message.ContentPart
# Load an existing image
{:ok, original_image} = File.read("photo.jpg")
# Create a context with the image and editing instructions
context = Context.new([
%Message{
role: :user,
content: [
ContentPart.image(original_image, "image/jpeg"),
ContentPart.text("Add a rainbow in the sky above the mountains")
]
}
])
# Generate the edited image
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
context, # Pass the full context instead of a string
aspect_ratio: "16:9"
)
edited_image = ReqLLM.Response.image_data(response)
File.write!("photo_with_rainbow.png", edited_image)
```
### Multi-Turn Image Refinement
You can iteratively refine images through conversation:
```elixir
alias ReqLLM.{Context, Message, Response}
alias ReqLLM.Message.ContentPart
# Initial generation
{:ok, response1} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
"A medieval castle on a hilltop"
)
first_image = Response.image_data(response1)
# Refine: add details
context = Context.new([
%Message{
role: :user,
content: [
ContentPart.image(first_image, "image/png"),
ContentPart.text("Add a dramatic sunset behind the castle with orange and purple clouds")
]
}
])
{:ok, response2} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
context
)
# Further refinement
second_image = Response.image_data(response2)
context2 = Context.new([
%Message{
role: :user,
content: [
ContentPart.image(second_image, "image/png"),
ContentPart.text("Add a dragon flying near one of the castle towers")
]
}
])
{:ok, final_response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
context2
)
```
### Style Transfer
Apply artistic styles to existing images:
```elixir
{:ok, photo} = File.read("portrait.jpg")
context = Context.new([
%Message{
role: :user,
content: [
ContentPart.image(photo, "image/jpeg"),
ContentPart.text("Transform this photo into a watercolor painting style")
]
}
])
{:ok, response} = ReqLLM.generate_image(
"google:gemini-2.5-flash-image",
context
)
```
### Prompting Tips for Google
Google recommends describing scenes rather than listing keywords:
```elixir
# Less effective
"cat, sitting, window, sunlight, cozy"
# More effective
"A content tabby cat lounging on a sunny windowsill,
warm afternoon light streaming through sheer curtains"
```
---
## Usage & Cost Tracking
Image generation responses include detailed usage and cost information:
### Basic Usage
```elixir
{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1", prompt)
response.usage
#=> %{
# image_usage: %{
# generated: %{count: 1, size_class: "1024x1024"}
# },
# cost: %{
# images: 0.04,
# tokens: 0.0,
# tools: 0.0,
# total: 0.04
# },
# input_cost: 0.0,
# output_cost: 0.04,
# total_cost: 0.04
# }
```
### Size Classes
Image costs vary by size. The `size_class` field indicates the resolution tier used for billing:
| Provider | Size Classes |
|----------|-------------|
| OpenAI | `"1024x1024"`, `"1536x1024"`, `"1024x1536"`, `"auto"` |
| Google | Based on aspect ratio (e.g., `"1:1"`, `"16:9"`) |
### Multiple Images
When generating multiple images, the `count` reflects the total:
```elixir
{:ok, response} = ReqLLM.generate_image("openai:dall-e-2", prompt, n: 3)
response.usage.image_usage.generated
#=> %{count: 3, size_class: "1024x1024"}
```
---
## Error Handling
```elixir
case ReqLLM.generate_image("openai:gpt-image-1", prompt) do
{:ok, response} ->
image_data = ReqLLM.Response.image_data(response)
File.write!("output.png", image_data)
{:error, %ReqLLM.Error.API.Request{status: 400, response_body: body}} ->
IO.puts("Bad request: #{inspect(body)}")
{:error, %ReqLLM.Error.Invalid.Parameter{} = error} ->
IO.puts("Invalid parameter: #{Exception.message(error)}")
{:error, error} ->
IO.puts("Error: #{inspect(error)}")
end
```
## Testing with Fixtures
Use fixtures to test image generation without making API calls:
```elixir
{:ok, response} = ReqLLM.generate_image(
"openai:gpt-image-1",
"A test prompt",
fixture: "image_basic"
)
```
See the [Fixture Testing](fixture-testing.md) guide for details.