# Image Generation Guide
Generate, edit, and upscale images using Google's Imagen models through the Vertex AI API.
## Overview
The Image Generation API (Imagen) allows you to:
- **Generate** high-quality images from text descriptions
- **Edit** existing images with inpainting and outpainting
- **Upscale** images to higher resolutions (2x or 4x)
**Important Notes:**
- Image generation requires Vertex AI authentication (not available on Gemini API)
- Each request can generate 1-8 images
- Generated images are returned as base64-encoded data
- Subject to Google's safety filters and Responsible AI policies
## Quick Start
```elixir
# Simple image generation
{:ok, images} = Gemini.APIs.Images.generate(
"A serene mountain landscape at sunset"
)
image = hd(images)
IO.puts("Generated: #{byte_size(image.image_data)} bytes")
# Save the image
File.write!("output.png", Base.decode64!(image.image_data))
```
## Generating Images
### Basic Generation
```elixir
alias Gemini.APIs.Images
alias Gemini.Types.Generation.Image.ImageGenerationConfig
# Default configuration (1:1 aspect ratio, PNG output)
{:ok, [image]} = Images.generate("A cute cat playing with yarn")
```
### Custom Configuration
```elixir
config = %ImageGenerationConfig{
number_of_images: 4,
aspect_ratio: "16:9",
safety_filter_level: :block_some,
person_generation: :allow_adult,
output_mime_type: "image/jpeg",
output_compression_quality: 90
}
{:ok, images} = Images.generate(
"Professional headshot photo of a business person",
config
)
```
### Aspect Ratios
Supported aspect ratios:
- `"1:1"` - Square (1024x1024) - default
- `"9:16"` - Portrait, mobile (768x1344)
- `"16:9"` - Landscape, desktop (1344x768)
- `"4:3"` - Standard portrait (896x1152)
- `"3:4"` - Standard landscape (1152x896)
```elixir
config = %ImageGenerationConfig{
aspect_ratio: "16:9",
number_of_images: 2
}
{:ok, images} = Images.generate("Cinematic landscape", config)
```
### Negative Prompts
Specify what to avoid in generated images:
```elixir
config = %ImageGenerationConfig{
negative_prompt: "blurry, low quality, distorted, watermark",
guidance_scale: 7.5
}
{:ok, images} = Images.generate("High quality portrait", config)
```
### Reproducible Generation
Use seeds for consistent results:
```elixir
config = %ImageGenerationConfig{
seed: 12345,
number_of_images: 1
}
# Generate the same image multiple times
{:ok, images1} = Images.generate("A red car", config)
{:ok, images2} = Images.generate("A red car", config)
# images1 and images2 will be identical
```
## Editing Images
### Inpainting
Edit specific regions of an image using a mask:
```elixir
alias Gemini.Types.Generation.Image.EditImageConfig
# Load your image and mask
image_data = File.read!("photo.png") |> Base.encode64()
mask_data = File.read!("mask.png") |> Base.encode64()
config = %EditImageConfig{
edit_mode: :inpainting,
guidance_scale: 15.0,
number_of_images: 2
}
{:ok, edited} = Images.edit(
"Replace the background with a beach scene",
image_data,
mask_data,
config
)
```
### Outpainting
Extend an image beyond its original boundaries:
```elixir
config = %EditImageConfig{
edit_mode: :outpainting,
mask_dilation: 10
}
{:ok, extended} = Images.edit(
"Continue the landscape to the right",
image_data,
mask_data,
config
)
```
### Product Image Editing
Specialized editing for product photography:
```elixir
config = %EditImageConfig{
edit_mode: :product_image,
number_of_images: 4
}
{:ok, edited} = Images.edit(
"Place product on white background",
image_data,
mask_data,
config
)
```
## Upscaling Images
### 2x Upscale
Double the resolution of an image:
```elixir
alias Gemini.Types.Generation.Image.UpscaleImageConfig
image_data = File.read!("small_image.png") |> Base.encode64()
config = %UpscaleImageConfig{
upscale_factor: :x2,
output_mime_type: "image/png"
}
{:ok, [upscaled]} = Images.upscale(image_data, config)
```
### 4x Upscale
Quadruple the resolution for maximum quality:
```elixir
config = %UpscaleImageConfig{
upscale_factor: :x4,
output_mime_type: "image/jpeg",
output_compression_quality: 95
}
{:ok, [upscaled]} = Images.upscale(image_data, config)
# Save high-quality result
File.write!("upscaled_4x.jpg", Base.decode64!(upscaled.image_data))
```
## Safety and Content Filtering
### Safety Filter Levels
Control content filtering strictness:
```elixir
# Strict filtering (recommended for public applications)
config = %ImageGenerationConfig{
safety_filter_level: :block_most
}
# Moderate filtering (default)
config = %ImageGenerationConfig{
safety_filter_level: :block_some
}
# Permissive filtering
config = %ImageGenerationConfig{
safety_filter_level: :block_few
}
# No filtering (use with caution)
config = %ImageGenerationConfig{
safety_filter_level: :block_none
}
```
### Person Generation Policy
Control generation of people in images:
```elixir
# Allow adult humans (18+)
config = %ImageGenerationConfig{
person_generation: :allow_adult,
safety_filter_level: :block_some
}
# Allow people of all ages
config = %ImageGenerationConfig{
person_generation: :allow_all
}
# Don't generate recognizable people (default)
config = %ImageGenerationConfig{
person_generation: :dont_allow
}
```
## Working with Generated Images
### Saving Images
```elixir
{:ok, images} = Images.generate("A beautiful sunset")
images
|> Enum.with_index()
|> Enum.each(fn {image, index} ->
# Decode base64 data
binary_data = Base.decode64!(image.image_data)
# Determine extension from MIME type
ext = if image.mime_type == "image/jpeg", do: "jpg", else: "png"
# Save to file
File.write!("output_#{index}.#{ext}", binary_data)
end)
```
### Image Metadata
```elixir
{:ok, [image]} = Images.generate("A cat")
IO.inspect(image.mime_type) # "image/png"
IO.inspect(image.image_size) # %{"width" => 1024, "height" => 1024}
IO.inspect(image.safety_attributes) # Safety classification results
IO.inspect(image.rai_info) # Responsible AI info
```
## Advanced Configuration
### Output Formats
```elixir
# PNG (lossless, larger file size)
config = %ImageGenerationConfig{
output_mime_type: "image/png"
}
# JPEG (lossy, smaller file size)
config = %ImageGenerationConfig{
output_mime_type: "image/jpeg",
output_compression_quality: 90 # 0-100
}
```
### Guidance Scale
Control how closely the model follows your prompt:
```elixir
# Lower values = more creative/varied
config = %ImageGenerationConfig{
guidance_scale: 3.0
}
# Default (balanced)
config = %ImageGenerationConfig{
guidance_scale: 7.0
}
# Higher values = stricter adherence to prompt
config = %ImageGenerationConfig{
guidance_scale: 15.0
}
```
### Language-Specific Prompts
```elixir
config = %ImageGenerationConfig{
language: "es" # Spanish prompt
}
{:ok, images} = Images.generate("Un gato jugando con una pelota", config)
```
### Watermarking
```elixir
# Disable watermark (default is true)
config = %ImageGenerationConfig{
add_watermark: false
}
```
## Error Handling
```elixir
case Images.generate("A realistic image") do
{:ok, images} ->
IO.puts("Generated #{length(images)} images")
{:error, %{type: :auth_error}} ->
IO.puts("Authentication failed. Check Vertex AI credentials.")
{:error, %{type: :api_error, message: msg}} ->
IO.puts("API error: #{msg}")
# May be blocked by safety filters
{:error, reason} ->
IO.puts("Error: #{inspect(reason)}")
end
```
## Best Practices
### 1. Be Specific in Prompts
```elixir
# Vague
"A landscape"
# Specific
"A serene mountain landscape at golden hour with snow-capped peaks, pine trees in the foreground, and a crystal-clear lake reflecting the scenery"
```
### 2. Use Negative Prompts
```elixir
config = %ImageGenerationConfig{
negative_prompt: "blurry, low quality, distorted, text, watermark, duplicated"
}
```
### 3. Batch Processing
```elixir
prompts = [
"A red car",
"A blue house",
"A green tree"
]
config = %ImageGenerationConfig{number_of_images: 2}
results = prompts
|> Task.async_stream(fn prompt ->
Images.generate(prompt, config)
end, max_concurrency: 3)
|> Enum.to_list()
```
### 4. Handle Safety Filters
```elixir
case Images.generate(prompt, config) do
{:ok, images} when length(images) == 0 ->
IO.puts("Content was blocked by safety filters")
{:ok, images} ->
Enum.each(images, fn image ->
if image.rai_info["blocked_reason"] do
IO.puts("Image blocked: #{image.rai_info["blocked_reason"]}")
end
end)
{:error, _} = error -> error
end
```
## Configuration Options
### `ImageGenerationConfig`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `number_of_images` | `1..8` | `1` | Number of images to generate |
| `aspect_ratio` | `String.t()` | `"1:1"` | Image aspect ratio |
| `safety_filter_level` | `atom()` | `:block_some` | Content filtering level |
| `person_generation` | `atom()` | `:dont_allow` | Person generation policy |
| `output_mime_type` | `String.t()` | `"image/png"` | Output format |
| `output_compression_quality` | `0..100` | `nil` | JPEG quality (JPEG only) |
| `negative_prompt` | `String.t()` | `nil` | What to avoid |
| `seed` | `integer()` | `nil` | Random seed for reproducibility |
| `guidance_scale` | `float()` | `nil` | Prompt adherence (1.0-20.0) |
| `language` | `String.t()` | `nil` | Prompt language code |
| `add_watermark` | `boolean()` | `true` | Add watermark to images |
### `EditImageConfig`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `prompt` | `String.t()` | `nil` | Edit description |
| `edit_mode` | `atom()` | `:inpainting` | Edit type |
| `mask_mode` | `atom()` | `:foreground` | Mask interpretation |
| `mask_dilation` | `0..50` | `0` | Expand mask by pixels |
| `guidance_scale` | `float()` | `nil` | Prompt adherence |
| `number_of_images` | `1..8` | `1` | Number of variations |
| `safety_filter_level` | `atom()` | `:block_some` | Content filtering |
| `seed` | `integer()` | `nil` | Random seed |
| `output_mime_type` | `String.t()` | `"image/png"` | Output format |
### `UpscaleImageConfig`
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `upscale_factor` | `:x2` or `:x4` | `:x2` | Scale factor |
| `output_mime_type` | `String.t()` | `"image/png"` | Output format |
| `output_compression_quality` | `0..100` | `nil` | JPEG quality (JPEG only) |
## See Also
- [Vertex AI Imagen Documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/image/overview)
- [Video Generation Guide](video_generation.md)
- [Multimodal Content](../README.md#multimodal-content)