# Transient per-user pods
This is the scenario ExAtlas was built for: a Phoenix app spawns a GPU pod
per active user, the user's browser talks directly to the pod, and the
pod is reaped when the user leaves.
## Why not proxy through the Phoenix app?
For real-time workloads (video inference, audio transcription, generative
streaming) the extra hop doubles latency and forces your Phoenix node to
carry per-user bandwidth. Handing the browser a URL that points straight
at the pod keeps Phoenix out of the data path.
## The flow
```
Browser Phoenix (Fly.io) RunPod pod
│ │ │
│ 1. open session │ │
├─────────────────────────►│ │
│ │ 2. spawn_compute │
│ ├────────────────────────────►│
│ │ (inject ATLAS_PRESHARED_KEY env var)
│ 3. {url, token} │◄────────────────────────────┤
│◄─────────────────────────┤ │
│ │
│ 4. inference over HTTPS with Authorization: Bearer │
├───────────────────────────────────────────────────────►│
│ │
│ 5. touch heartbeats │
├─────────────────────────►│ │
│ │ │
│ 6. idle_ttl_ms passes with no heartbeat │
│ │ 7. terminate │
│ ├────────────────────────────►│
```
## Implementation
### The LiveView
```elixir
defmodule MyAppWeb.InferenceLive do
use MyAppWeb, :live_view
@idle_ttl_ms 15 * 60_000 # 15 minutes
def mount(_params, _session, socket) do
{:ok, _pid, compute} =
ExAtlas.Orchestrator.spawn(
gpu: :h100,
image: "ghcr.io/me/my-inference-server:latest",
ports: [{8000, :http}],
auth: :bearer,
user_id: socket.assigns.current_user.id,
idle_ttl_ms: @idle_ttl_ms,
name: "atlas-" <> to_string(socket.assigns.current_user.id)
)
Phoenix.PubSub.subscribe(ExAtlas.PubSub, "compute:" <> compute.id)
{:ok,
assign(socket,
compute_id: compute.id,
inference_url: hd(compute.ports).url,
inference_token: compute.auth.token
)}
end
def handle_event("ping", _, socket) do
_ = ExAtlas.Orchestrator.touch(socket.assigns.compute_id)
{:noreply, socket}
end
def handle_info({:atlas_compute, _id, {:status, :terminated}}, socket) do
{:noreply,
socket
|> put_flash(:info, "Inference session ended")
|> redirect(to: ~p"/")}
end
def handle_info({:atlas_compute, _id, _other}, socket), do: {:noreply, socket}
def terminate(_reason, socket) do
# LiveView process is dying; cut the pod short to save $
_ = ExAtlas.Orchestrator.stop_tracked(socket.assigns.compute_id)
:ok
end
end
```
### The inference server (inside the pod)
```elixir
defmodule InferenceServer do
@moduledoc """
Minimal Plug app running inside the RunPod pod. Rejects any request
that doesn't carry the preshared key injected by ExAtlas.
"""
import Plug.Conn
@behaviour Plug
def init(_), do: []
def call(conn, _) do
if authenticated?(conn) do
handle(conn)
else
conn |> put_status(401) |> send_resp(401, "unauthorized") |> halt()
end
end
defp authenticated?(conn) do
preshared = System.fetch_env!("ATLAS_PRESHARED_KEY")
case get_req_header(conn, "authorization") do
["Bearer " <> token] -> Plug.Crypto.secure_compare(token, preshared)
_ -> false
end
end
defp handle(conn) do
# ... your inference logic ...
end
end
```
### Signed URLs for media streams
`<video src>` can't send an `Authorization` header. Use
`ExAtlas.Auth.SignedUrl`:
```elixir
# Generate a secret once per pod, inject it via env var (ExAtlas already does
# this when auth: :signed_url)
signed =
ExAtlas.Auth.SignedUrl.sign(
hd(compute.ports).url <> "/video/session-42.m3u8",
secret: compute.auth.token,
expires_in: 3600
)
# In the LiveView:
<video src={signed} />
```
## Choosing `idle_ttl_ms`
- Too short: users blink and the pod dies. Bad UX, repeated cold starts
(and RunPod boot times on some GPUs can be 30-90 seconds).
- Too long: abandoned sessions burn $/hour until the reaper catches them.
A good default is **2–3× your expected user-idle window**. If your app
sends a `:ping` every 30 seconds and users normally stay active,
`idle_ttl_ms: 120_000` is reasonable. For exploratory/bursty tools
(generative art, Jupyter-like), go higher (10–15 min).
## What the orchestrator protects against
1. **Node crashes.** When the Phoenix node restarts, the Reaper finds
orphan pods (live on RunPod, not tracked locally, name prefix matches)
and terminates them within `:reap_interval_ms`.
2. **LiveView disconnect without clean shutdown.** The `ComputeServer`'s
idle timer fires regardless of what's talking to it.
3. **Provider API hiccups.** `terminate/2` errors are logged and broadcast
as `{:terminate_failed, error}` but don't cause the server to hang.
## Pitfalls
- **Don't** share a single pod across users unless you've designed for
isolation. The preshared-key model assumes one key per pod.
- **Don't** put the orchestrator in a cluster-shared PubSub — ExAtlas's
PubSub is per-node. If you need cluster-wide visibility, subscribe
from each node and reduce upstream.
- **Don't** spawn from a `Task.start/1` without supervision. If the task
crashes between the provider call and the ComputeServer start, the pod
is live on the cloud but untracked. The Reaper will eventually catch
it, but your budget won't thank you.