guides/live_api.md

Select File
# Live API Guide

The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience.

## Table of Contents

1. [Overview](#overview)
2. [Implementation Approaches](#implementation-approaches)
3. [WebSocket Connection](#websocket-connection)
4. [Supported Modalities](#supported-modalities)
5. [Models and Response Modalities](#models-and-response-modalities)
6. [Establishing a Session](#establishing-a-session)
7. [Sending Content](#sending-content)
8. [Receiving Responses](#receiving-responses)
9. [Voice Activity Detection](#voice-activity-detection)
10. [Native Audio Features](#native-audio-features)
11. [Tool Use and Function Calling](#tool-use-and-function-calling)
12. [Session Management](#session-management)
13. [Ephemeral Tokens](#ephemeral-tokens)
14. [Limitations](#limitations)
15. [Examples](#examples)

## Overview

The Live API is a stateful, bidirectional streaming API built on WebSockets. Unlike the standard `generateContent` API, the Live API maintains a persistent connection where you can:

- Send text, audio, or video continuously to the Gemini server
- Receive audio, text, or function call requests from the Gemini server
- Interrupt model responses mid-generation
- Resume sessions after disconnection
- Use automatic voice activity detection for hands-free conversations

### Key Features

- **Voice Activity Detection (VAD)**: Automatic detection of when users start and stop speaking
- **Tool Use and Function Calling**: Execute functions during real-time conversations
- **Session Management**: Resume sessions, compress context windows, handle graceful disconnections
- **Ephemeral Tokens**: Secure client-side authentication for browser/mobile applications
- **Native Audio**: Natural speech output with affective dialog and proactive responses (v1alpha)

## Implementation Approaches

When integrating with the Live API, choose between:

### Server-to-Server

Your backend connects to the Live API using WebSockets. The client sends stream data (audio, video, text) to your server, which then forwards it to the Live API.

```
Client App -> Your Backend -> Live API
```

### Client-to-Server

Your frontend connects directly to the Live API using WebSockets, bypassing your backend.

```
Client App -> Live API
```

Client-to-server offers better performance for streaming audio and video since it eliminates the hop through your backend. However, for production environments, use [ephemeral tokens](#ephemeral-tokens) instead of standard API keys to mitigate security risks.

## WebSocket Connection

### Endpoint

The Live API uses WebSocket connections to the following endpoints:

**Gemini API (AI Studio):**
```
wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent
```

**Vertex AI:**
```
wss://{location}-aiplatform.googleapis.com/ws/google.cloud.aiplatform.v1.LlmBidiService/BidiGenerateContent
```

> Vertex Live API requires billing enabled on the target GCP project. Without billing, the server closes setup with policy error `1008`.

### API Version

For Gemini API (`auth: :gemini`) connections, the standard API version is `v1beta`. Some features require `v1alpha`:
- Affective dialog
- Proactive audio
- Ephemeral tokens

Set the API version per session:

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  api_version: "v1alpha",      # required for native audio extras
  generation_config: %{response_modalities: ["AUDIO"]}
)
```

For Vertex AI (`auth: :vertex_ai`) connections, `gemini_ex` uses the Vertex Live `v1` endpoint.

This library abstracts the WebSocket connection details. You interact through the `Gemini.Live.Session` module.

### Backend Schema Differences

Gemini Live and Vertex Live use slightly different wire fields for response usage metadata:

- Gemini Live sends `responseTokenCount` and `responseTokensDetails`
- Vertex Live `v1` sends `candidatesTokenCount` and `candidatesTokensDetails`
- Vertex Live may also include `turnCompleteReason` on `serverContent`

`gemini_ex` normalizes both backends into the same Live types:

- `Gemini.Types.Live.UsageMetadata.candidates_token_count` and `candidates_tokens_details` are the canonical output-token fields
- `response_token_count` and `response_tokens_details` are retained as backwards-compatible aliases
- `Gemini.Types.Live.UsageMetadata.output_token_count/1` and `output_tokens_details/1` return the normalized output view
- `Gemini.Types.Live.ServerContent.turn_complete_reason` is parsed as a `Gemini.Types.Live.Enums.TurnCompleteReason` value when present

Example callback code that works across both backends:

```elixir
on_message: fn
  %{server_content: content, usage_metadata: usage} ->
    output_tokens = Gemini.Types.Live.UsageMetadata.output_token_count(usage)
    reason = if content, do: content.turn_complete_reason

    IO.inspect(%{
      output_tokens: output_tokens,
      turn_complete_reason: reason
    })

  _ ->
    :ok
end
```

### Session Configuration

The initial message after establishing the WebSocket connection sets the session configuration:

```elixir
alias Gemini.Live.Models

%{
  model: Models.resolve(:audio),
  generation_config: %{
    response_modalities: ["AUDIO"],
    temperature: 0.7,
    speech_config: %{voice_config: %{prebuilt_voice_config: %{voice_name: "Kore"}}}
  },
  system_instruction: "You are a helpful assistant.",
  tools: [%{function_declarations: [...]}]
}
```

Configuration cannot be updated while the connection is open. However, you can change parameters (except the model) when resuming via session resumption.

## Supported Modalities

### Input Modalities

| Modality | Format | Notes |
|----------|--------|-------|
| Audio | 16-bit PCM, little-endian | Input natively at 16kHz; the API resamples other rates. MIME type: `audio/pcm;rate=16000` |
| Video | JPEG/PNG frames | Sent as base64-encoded blobs |
| Text | UTF-8 string | Via `clientContent` or `realtimeInput` |

### Output Modalities

| Modality | Format | Notes |
|----------|--------|-------|
| Audio | 16-bit PCM, 24kHz | Native audio output models only |
| Text | UTF-8 string | Model-dependent; validate against the selected model |

**Important:** You can only set one response modality per session. Support is model-specific, and `Session.connect/1` rejects unsupported combinations before opening the WebSocket.

## Models and Response Modalities

### Native Audio Models (Recommended for Voice)

Native audio output provides natural, realistic-sounding speech with improved multilingual performance. Use these models when you need audio responses:

```elixir
alias Gemini.Live.Models

# Resolve a Live audio model available for this key
model = Models.resolve(:audio)
```

Native audio models support:
- 128k token context window
- Affective (emotion-aware) dialogue (v1alpha)
- Proactive audio responses (v1alpha)
- Thinking capabilities

### Text UX over Live Audio Sessions

Current Gemini Live models are audio-first. To build a text-oriented terminal or
chat UI, use an audio session and enable output transcription:

```elixir
alias Gemini.Live.Models

# Resolve a Live audio model available for this key
model = Models.resolve(:audio)
```

### Model Availability and Rollout Variability

Live API model availability can vary by project and rollout. The canonical
Live docs may list newer models that are not yet enabled for your API key.
When that happens, the Live API returns a `1008` close error like:

```
Publisher Model `projects/.../publishers/google/models/<model>` was not found
or is not supported for bidiGenerateContent
```

To make this robust, this library resolves a Live model at runtime based on
your key's `list_models` results.

Use the resolver:

```elixir
alias Gemini.Live.Models

audio_model = Models.resolve(:audio)
```

The resolver uses the model registry plus runtime `list_models` results for your
credentials. For current Gemini Live usage, prefer `Models.resolve(:audio)`.
This library validates the session configuration locally so incompatible
response modalities fail before the WebSocket opens.

If the audio model is not present in your Live-capable model list, audio
sessions will not work for that key yet.

You can inspect what your key supports:

```bash
GEMINI_API_KEY=YOUR_KEY mix run -e 'alias Gemini.APIs.Coordinator; {:ok, resp}=Coordinator.list_models(); resp.models |> Enum.filter(&Enum.member?(&1.supported_generation_methods, "bidiGenerateContent")) |> Enum.each(fn m -> IO.puts(m.name) end)'
```

If you want to hardcode a model, prefer the resolver's fallback choices when
newer Live models are not present in that list.

### Session Limits

| Configuration | Duration Limit |
|---------------|----------------|
| Audio only | 15 minutes |
| Audio + Video | 2 minutes |

Use [context window compression](#context-window-compression) or [session resumption](#session-resumption) to extend beyond these limits.

## Establishing a Session

### Basic Setup

```elixir
alias Gemini.Live.Models
alias Gemini.Live.Session

# Resolve a model that is available for this API key
model = Models.resolve(:audio)

{:ok, session} = Session.start_link(
  model: model,
  auth: :gemini,  # or :vertex_ai
  generation_config: %{
    response_modalities: ["AUDIO"]
  },
  output_audio_transcription: %{},
  on_message: fn message ->
    IO.inspect(message, label: "Received")
  end
)

# Connect to the Live API
:ok = Session.connect(session)

# Session is now ready for messages
```

For standalone Gemini sessions, you can also pass `api_key:` directly to `Session.start_link/1`. When `api_key:` is present and `auth:` is omitted, the session uses Gemini auth for that connection only. Governed sessions use `Gemini.GovernedAuthority` instead and reject direct per-session credentials.

```elixir
{:ok, session} = Gemini.Live.Session.start_link(
  model: Gemini.Live.Models.resolve(:audio),
  api_key: "session-specific-key",
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{}
)
```

### Full Configuration Options

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  # Required
  model: Models.resolve(:audio),

  # Authentication
  auth: :gemini,  # or :vertex_ai
  api_key: "session-specific-key",  # optional per-session Gemini override when using Gemini auth
  project_id: "your-project",  # required for :vertex_ai
  location: "us-central1",     # optional, default: "us-central1"
  api_version: "v1alpha",

  # Generation configuration
  generation_config: %{
    response_modalities: ["AUDIO"],
    temperature: 0.7,
    top_p: 0.95,
    speech_config: %{
      voice_config: %{
        prebuilt_voice_config: %{voice_name: "Kore"}
      }
    }
  },

  # System instruction
  system_instruction: "You are a helpful voice assistant.",

  # Tools for function calling
  tools: [%{function_declarations: [...]}],

  # Realtime input configuration
  realtime_input_config: %{
    automatic_activity_detection: %{
      disabled: false,  # true for manual VAD
      start_of_speech_sensitivity: "START_SENSITIVITY_HIGH",
      end_of_speech_sensitivity: "END_SENSITIVITY_HIGH"
    }
  },

  # Session management
  session_resumption: %{},           # Enable session resumption
  resume_handle: "previous-handle",  # Resume from previous session
  context_window_compression: %{sliding_window: %{}},

  # Audio transcription
  input_audio_transcription: %{},
  output_audio_transcription: %{},

  # Callbacks
  on_message: &handle_message/1,
  on_error: &handle_error/1,
  on_close: &handle_close/1,
  on_tool_call: &handle_tool_call/1,
  on_tool_call_cancellation: &handle_cancellation/1,
  on_transcription: &handle_transcription/1,
  on_voice_activity: &handle_voice_activity/1,
  on_session_resumption: &handle_resumption/1,
  on_go_away: &handle_go_away/1
)
```

## Sending Content

The Live API provides two methods for sending content, each with different semantics:

### Preferred helper

Use `Session.send_text/3` for text turns in portable code. It selects the
correct transport for the connected model (`clientContent` for older models,
`realtimeInput.text` for Gemini 3.1 Live).

```elixir
Session.send_text(session, "What is the capital of France?")
```

### clientContent (Ordered, Explicit Turns)

Use `send_client_content/3` only when you specifically need the `clientContent`
wire format. This method:
- Adds content to the conversation history in order
- Interrupts any current model generation
- Requires explicit turn completion signal

Note: Gemini 3.1 Flash Live only supports `clientContent` for initial-history
seeding. For ongoing text turns, use `Session.send_text/3` or
`Session.send_realtime_input/2`.

```elixir
# Seed initial history before ongoing realtime turns
Session.send_client_content(session, [
  %{role: "user", parts: [%{text: "What is the capital of France?"}]},
  %{role: "model", parts: [%{text: "Paris"}]},
  %{role: "user", parts: [%{text: "What about Germany?"}]}
], turn_complete: true)
```

### realtimeInput (Streaming, Optimized for Speed)

Use `send_realtime_input/2` for continuous streaming data (audio, video, text). This method:
- Streams data without interrupting model generation
- Optimizes for low latency at the expense of deterministic ordering
- Derives turn boundaries from activity detection (VAD)
- Processes data incrementally before turn completion

```elixir
# Send audio chunk (16-bit PCM, 16kHz mono)
Session.send_realtime_input(session, audio: %{
  data: pcm_data,  # binary data, will be Base64 encoded
  mime_type: "audio/pcm;rate=16000"
})

# Send video frame
Session.send_realtime_input(session, video: %{
  data: jpeg_data,
  mime_type: "image/jpeg"
})

# Send text via realtime input
Session.send_realtime_input(session, text: "Hello")

# Manual activity signaling (when automatic VAD is disabled)
Session.send_realtime_input(session, activity_start: true)
# ... send audio chunks ...
Session.send_realtime_input(session, activity_end: true)

# Signal audio stream pause (for automatic VAD)
Session.send_realtime_input(session, audio_stream_end: true)
```

### Ordering Considerations

- `clientContent` messages are added to context in order
- `realtimeInput` is optimized for responsiveness; ordering across modalities is not guaranteed
- If you mix `clientContent` and `realtimeInput`, the server attempts to optimize but provides no ordering guarantees

## Receiving Responses

Responses are delivered through the `on_message` callback. The server sends `BidiGenerateContentServerMessage` which may contain:

### Message Types

| Field | Description |
|-------|-------------|
| `setup_complete` | Session setup successful |
| `server_content` | Model response content |
| `tool_call` | Function call request |
| `tool_call_cancellation` | Cancelled tool calls (due to interruption) |
| `go_away` | Session ending soon notice |
| `session_resumption_update` | New resumption handle |
| `voice_activity` | Voice activity signals |
| `usage_metadata` | Token usage information |

### Server Content

```elixir
on_message: fn message ->
  case message do
    %{server_content: content} when not is_nil(content) ->
      # Extract text
      if text = Gemini.Types.Live.ServerContent.extract_text(content) do
        IO.write(text)
      end

      # Handle audio output
      if content.model_turn && content.model_turn.parts do
        for part <- content.model_turn.parts do
          if audio_data = part[:inline_data] do
            # Process audio (24kHz PCM)
            play_audio(audio_data.data)
          end
        end
      end

      # Turn completion signals
      if content.turn_complete do
        IO.puts("\n[Turn complete]")
      end

      # Generation complete (before turn_complete when streaming)
      if content.generation_complete do
        IO.puts("[Generation complete]")
      end

      # Handle interruption
      if content.interrupted do
        IO.puts("[Interrupted by user]")
        clear_audio_queue()
      end

    _ -> :ok
  end
end
```

### Transcription

When transcription is enabled, you receive transcriptions separately from content:

```elixir
on_transcription: fn
  {:input, %{"text" => text}} ->
    IO.puts("User said: #{text}")

  {:output, %{"text" => text}} ->
    IO.puts("Model said: #{text}")
end
```

## Voice Activity Detection

VAD allows the model to recognize when a person is speaking, enabling natural interruptions.

### Automatic VAD (Default)

When automatic VAD is enabled, the model automatically detects speech and triggers responses:

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  # VAD is enabled by default
  on_message: fn message ->
    case message do
      %{server_content: %{interrupted: true}} ->
        # User interrupted - clear playback queue
        clear_audio_playback()
      _ -> :ok
    end
  end
)
```

When the audio stream is paused (e.g., microphone turned off), send `audio_stream_end` to flush cached audio:

```elixir
Session.send_realtime_input(session, audio_stream_end: true)
```

### VAD Configuration

Fine-tune VAD behavior:

```elixir
realtime_input_config: %{
  automatic_activity_detection: %{
    disabled: false,
    start_of_speech_sensitivity: "START_SENSITIVITY_LOW",  # or HIGH
    end_of_speech_sensitivity: "END_SENSITIVITY_LOW",      # or HIGH
    prefix_padding_ms: 20,      # Audio to keep before speech detection
    silence_duration_ms: 100    # Silence required for end-of-speech
  }
}
```

### Manual VAD

For push-to-talk or custom VAD implementations:

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  realtime_input_config: %{
    automatic_activity_detection: %{disabled: true}
  }
)

# When user presses talk button
Session.send_realtime_input(session, activity_start: true)

# Stream audio while talking
for chunk <- audio_chunks do
  Session.send_realtime_input(session, audio: %{
    data: chunk,
    mime_type: "audio/pcm;rate=16000"
  })
end

# When user releases talk button
Session.send_realtime_input(session, activity_end: true)
```

## Native Audio Features

Native audio models support advanced features (requires `v1alpha` API version for some features).

### Voice Selection

```elixir
generation_config: %{
  response_modalities: ["AUDIO"],
  speech_config: %{
    voice_config: %{
      prebuilt_voice_config: %{voice_name: "Kore"}
    }
  }
}
```

Available voices include: Kore, Puck, Charon, Fenrir, Aoede, and others. Listen to voices in [AI Studio](https://aistudio.google.com/app/live).

### Affective Dialog (v1alpha)

Adapts response style to input expression and tone:

```elixir
alias Gemini.Live.Models

# Note: Requires v1alpha API version
{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  api_version: "v1alpha",
  generation_config: %{response_modalities: ["AUDIO"]},
  enable_affective_dialog: true
)
```

### Proactive Audio (v1alpha)

Allows the model to decide not to respond if content is irrelevant:

```elixir
alias Gemini.Live.Models

# Note: Requires v1alpha API version
{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  api_version: "v1alpha",
  generation_config: %{response_modalities: ["AUDIO"]},
  proactivity: %{proactive_audio: true}
)
```

### Thinking

Native audio models support thinking capabilities:

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  api_version: "v1alpha",
  generation_config: %{
    response_modalities: ["AUDIO"],
    thinking_config: %{
      thinking_budget: 1024,     # Token budget for thinking
      include_thoughts: true     # Include thought summaries
    }
  }
)
```

## Tool Use and Function Calling

The Live API supports function calling, but unlike `generateContent`, you must handle tool responses manually.

### Defining Tools

```elixir
tools = [
  %{
    function_declarations: [
      %{
        name: "get_weather",
        description: "Get current weather for a location",
        parameters: %{
          type: "object",
          properties: %{
            location: %{type: "string", description: "City name"}
          },
          required: ["location"]
        }
      }
    ]
  }
]
```

### Handling Tool Calls

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  tools: tools,

  on_tool_call: fn %{function_calls: calls} ->
    responses = Enum.map(calls, fn call ->
      result = case call.name do
        "get_weather" ->
          location = call.args["location"]
          get_weather_data(location)  # Your implementation
        _ ->
          %{error: "Unknown function"}
      end

      %{id: call.id, name: call.name, response: result}
    end)

    # Return responses to send automatically
    {:tool_response, responses}
  end
)
```

Alternatively, send tool responses manually:

```elixir
Session.send_tool_response(session, [
  %{id: "call_123", name: "get_weather", response: %{temp: 72}}
])
```

### Asynchronous Function Calling

For non-blocking function execution on legacy 2.5 native-audio models:

```elixir
tools = [
  %{
    function_declarations: [
      %{
        name: "long_running_task",
        behavior: "NON_BLOCKING"  # Execute asynchronously
      }
    ]
  }
]

# Control response timing with scheduling
Session.send_tool_response(session, [
  %{
    id: "call_123",
    name: "long_running_task",
    response: %{result: "done"},
    scheduling: :interrupt   # or :when_idle, :silent
  }
])
```

Scheduling options:
- `:interrupt` - Interrupt current generation immediately
- `:when_idle` - Wait until current turn completes
- `:silent` - Don't generate a response

### Tool Call Cancellation

When the user interrupts during function execution, the server sends cancellation:

```elixir
on_tool_call_cancellation: fn cancelled_ids ->
  IO.puts("Cancelled: #{inspect(cancelled_ids)}")
  # Attempt to undo side effects if possible
end
```

## Session Management

### Session Resumption

Resume sessions after disconnection to preserve conversation context:

```elixir
alias Gemini.Live.Models

# First session - enable resumption
{:ok, session1} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  session_resumption: %{},
  on_session_resumption: fn %{handle: handle, resumable: true} ->
    # Store handle for later use
    save_handle(handle)
  end
)

:ok = Session.connect(session1)
:ok = Session.send_text(session1, "Remember: my name is Alice.")
Process.sleep(3000)

# Get handle before closing
handle = Session.get_session_handle(session1)
Session.close(session1)

# Later - resume with saved handle
{:ok, session2} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  resume_handle: handle,
  session_resumption: %{}
)

:ok = Session.connect(session2)
:ok = Session.send_text(session2, "What's my name?")
# Model should remember: Alice
```

Resumption tokens are valid for 2 hours after the last session termination.

### Context Window Compression

Enable sliding window compression for long sessions:

```elixir
alias Gemini.Live.Models

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  context_window_compression: %{
    sliding_window: %{
      target_tokens: 16000  # Target after compression
    },
    trigger_tokens: 24000   # When to trigger compression
  }
)
```

Compression extends session duration indefinitely but may affect response quality as older context is discarded.

### GoAway Notice

The server sends a GoAway message before disconnecting:

```elixir
on_go_away: fn %{time_left_ms: time_left, handle: handle} ->
  IO.puts("Session ending in #{time_left}ms")

  # Save handle for resumption
  if handle, do: save_handle(handle)

  # Prepare for reconnection
  schedule_reconnect()
end
```

### Generation Complete

The server sends `generation_complete` when the model finishes generating (before `turn_complete`):

```elixir
on_message: fn message ->
  case message do
    %{server_content: %{generation_complete: true}} ->
      IO.puts("[Model finished generating]")

    %{server_content: %{turn_complete: true}} ->
      IO.puts("[Turn complete - ready for next input]")

    _ -> :ok
  end
end
```

## Ephemeral Tokens

Ephemeral tokens are short-lived authentication tokens for client-to-server implementations. They enhance security by:

- Expiring quickly (default: 30 minutes)
- Limiting the number of sessions they can create
- Optionally constraining configuration options

### Token Constraints

Ephemeral tokens require `v1alpha` API version and are only compatible with the Live API.

**Token Properties:**
- `expire_time`: When messages will be rejected (default: 30 minutes)
- `new_session_expire_time`: When new sessions will be rejected (default: 1 minute)
- `uses`: Number of sessions the token can start (default: 1)

### Creating Tokens (Server-Side)

Create tokens on your backend and pass them to clients:

```elixir
# This would typically be done via the REST API on your backend
# The token is then passed to the client application

# Example token structure returned from API:
%{
  "name" => "ephemeral-token-string",  # Use this as the API key
  "expireTime" => "2025-01-23T12:00:00Z",
  "newSessionExpireTime" => "2025-01-23T11:31:00Z"
}
```

### Using Tokens (Client-Side)

The client uses the token as if it were an API key:

```javascript
// In browser/mobile client
const session = await ai.live.connect({
  model: 'gemini-3.1-flash-live-preview',
  apiKey: ephemeralToken.name,  // Use token instead of API key
  config: { responseModalities: ['AUDIO'] }
});
```

### Token with Configuration Constraints

Lock tokens to specific configurations for additional security:

```elixir
# Server-side token creation with constraints
token_config = %{
  uses: 1,
  live_connect_constraints: %{
    model: "gemini-3.1-flash-live-preview",
    config: %{
      session_resumption: %{},
      temperature: 0.7,
      response_modalities: ["AUDIO"]
    }
  }
}
```

### Best Practices

1. Set short expiration times
2. Verify secure authentication on your backend before issuing tokens
3. Don't use ephemeral tokens for server-to-server connections (unnecessary overhead)
4. Use `sessionResumption` within a token's `expireTime` to reconnect without consuming additional uses

## Limitations

### Response Modalities

Configure only one response modality per session. For current Gemini Live
models, use `["AUDIO"]` and enable output transcription when you need text UX.

### Session Duration

Without compression:
- Audio-only: 15 minutes
- Audio + Video: 2 minutes

### Context Window

Context window limits vary by Live model generation and rollout. Check the
selected model's current documentation or registry entry instead of assuming the
older native-audio versus text-live split.

### Authentication

Standard API keys should not be used in client-side code. Use [ephemeral tokens](#ephemeral-tokens) for client-to-server implementations.

### Supported Languages

Native audio models automatically detect language and don't support explicit language codes. See the [canonical documentation](https://ai.google.dev/gemini-api/docs/live-guide#supported-languages) for the full list of supported languages.

## Examples

### Text Chat Session

```elixir
alias Gemini.Live.Session
alias Gemini.Live.Models

model = Models.resolve(:audio)

{:ok, session} = Session.start_link(
  model: model,
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  system_instruction: "You are a helpful assistant.",
  on_message: fn
    %{server_content: content} when not is_nil(content) ->
      if text = Gemini.Types.Live.ServerContent.extract_text(content) do
        IO.write(text)
      end
      if content.turn_complete, do: IO.puts("\n")
    _ -> :ok
  end
)

:ok = Session.connect(session)

Session.send_text(session, "What is machine learning?")
Process.sleep(5000)

Session.close(session)
```

### Audio Streaming

```elixir
alias Gemini.Live.Session
alias Gemini.Live.Models

model = Models.resolve(:audio)

{:ok, session} = Session.start_link(
  model: model,
  auth: :gemini,
  api_version: "v1alpha",
  generation_config: %{
    response_modalities: ["AUDIO"],
    speech_config: %{voice_config: %{prebuilt_voice_config: %{voice_name: "Kore"}}}
  },
  input_audio_transcription: %{},
  output_audio_transcription: %{},
  on_message: fn
    %{server_content: content} when not is_nil(content) ->
      # Handle audio output
      if content.model_turn && content.model_turn.parts do
        for part <- content.model_turn.parts do
          if part[:inline_data], do: play_audio(part.inline_data.data)
        end
      end
    _ -> :ok
  end,
  on_transcription: fn
    {:input, t} -> IO.puts("User: #{t["text"]}")
    {:output, t} -> IO.puts("Model: #{t["text"]}")
  end
)

:ok = Session.connect(session)

# Send audio chunks (16kHz PCM)
for chunk <- audio_chunks do
  Session.send_realtime_input(session, audio: %{
    data: chunk,
    mime_type: "audio/pcm;rate=16000"
  })
end

Process.sleep(5000)
Session.close(session)
```

### Function Calling

```elixir
alias Gemini.Live.Session
alias Gemini.Live.Models

tools = [
  %{
    function_declarations: [
      %{
        name: "get_stock_price",
        description: "Get current stock price",
        parameters: %{
          type: "object",
          properties: %{symbol: %{type: "string"}},
          required: ["symbol"]
        }
      }
    ]
  }
]

{:ok, session} = Session.start_link(
  model: Models.resolve(:audio),
  auth: :gemini,
  generation_config: %{response_modalities: ["AUDIO"]},
  output_audio_transcription: %{},
  tools: tools,
  on_tool_call: fn %{function_calls: calls} ->
    responses = Enum.map(calls, fn call ->
      result = case call.name do
        "get_stock_price" -> %{price: 178.50, currency: "USD"}
        _ -> %{error: "Unknown function"}
      end
      %{id: call.id, name: call.name, response: result}
    end)
    {:tool_response, responses}
  end,
  on_message: fn
    %{server_content: c} when not is_nil(c) ->
      if text = Gemini.Types.Live.ServerContent.extract_text(c), do: IO.write(text)
      if c.turn_complete, do: IO.puts("\n")
    _ -> :ok
  end
)

:ok = Session.connect(session)
Session.send_text(session, "What's Apple's stock price?")
Process.sleep(10000)
Session.close(session)
```

### Session Resumption

See `examples/13_live_session_resumption.exs` for a complete example.

## Testing Live Sessions

When your environment variables are already exported, run the Live integration tests directly:

```bash
# Gemini Live session coverage
mix test --only live_gemini test/gemini/live/session_live_test.exs

# Gemini Live advanced features
mix test --only live_gemini test/gemini/live/features_live_test.exs

# Vertex Live coverage (billed; requires explicit opt-in)
RUN_BILLED_VERTEX_LIVE_TESTS=1 mix test --only live_vertex_ai test/gemini/live/session_vertex_live_test.exs
```

The default test suite excludes `:live_gemini` and `:live_vertex_ai`, so these targeted commands are the intended manual verification path for real credentials.

If your Vertex project does not expose a compatible Live audio model, the Vertex session tests will skip instead of failing.

## Further Reading

- [Example Files](https://github.com/nshkrdotcom/gemini_ex/tree/main/examples)
  - `11_live_text_chat.exs` - Multi-turn text conversations
  - `12_live_audio_streaming.exs` - Audio input/output
  - `13_live_session_resumption.exs` - Session resumption
  - `14_live_function_calling.exs` - Tool use with telemetry
- [Google's Live API Documentation](https://ai.google.dev/gemini-api/docs/live)
- [WebSocket API Reference](https://ai.google.dev/api/live)