guides/deploying.md

Select File:
# Deploying

OpenResponses is a standard Phoenix application. Any deployment approach that works for Phoenix works here.

## Environment variables

At minimum, set the API keys for the providers you use:

```bash
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
SECRET_KEY_BASE=...   # Phoenix requirement
PHX_HOST=your-domain.com
```

## Release builds

Build a release with `mix release`:

```bash
MIX_ENV=prod mix assets.deploy
MIX_ENV=prod mix release
```

The release is self-contained. Run it:

```bash
PHX_HOST=your-domain.com \
OPENAI_API_KEY=sk-... \
_build/prod/rel/open_responses/bin/open_responses start
```

## Docker

```dockerfile
FROM hexpm/elixir:1.18.3-erlang-27.3-alpine-3.21.0 AS build

WORKDIR /app
RUN mix do local.hex --force, local.rebar --force

COPY mix.exs mix.lock ./
RUN MIX_ENV=prod mix deps.get --only prod
RUN MIX_ENV=prod mix deps.compile

COPY lib lib
COPY priv priv
COPY guides guides
COPY config config

RUN MIX_ENV=prod mix assets.deploy
RUN MIX_ENV=prod mix release

FROM alpine:3.21.0 AS runtime

RUN apk add --no-cache libstdc++ openssl ncurses-libs

WORKDIR /app
COPY --from=build /app/_build/prod/rel/open_responses ./

ENV MIX_ENV=prod
ENV PHX_SERVER=true

EXPOSE 4000

ENTRYPOINT ["bin/open_responses"]
CMD ["start"]
```

## Fly.io

```bash
fly launch
fly secrets set OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-...
fly deploy
```

## Clustering

OpenResponses uses `Phoenix.PubSub` for event broadcasting within a node. In a multi-node deployment, PubSub must be configured for distribution:

```elixir
config :open_responses, OpenResponses.PubSub,
  adapter: Phoenix.PubSub.PG2
```

And use `dns_cluster` (already included) for automatic node discovery on platforms like Fly.io:

```elixir
config :open_responses, :dns_cluster_query, System.get_env("DNS_CLUSTER_QUERY")
```

## Response cache in a cluster

The default Cachex cache is per-node in-memory. In a cluster, a `previous_response_id` might reference a response stored on a different node.

For production clustering, switch to a distributed cache or enable the AshPostgres persistence layer (Phase 3):

```elixir
# Option 1: Cachex with distributed adapter (Nebulex)
# Option 2: AshPostgres — responses stored in Postgres, accessible across nodes
```

Until then, use sticky sessions to route a user's requests to the same node.

## Scaling considerations

| Concern | Guidance |
|---|---|
| Concurrent requests | The BEAM handles thousands of simultaneous loops comfortably. No special config needed. |
| Long-running agentic loops | Use streaming. Non-streaming requests hold a connection open. |
| Provider rate limits | Add a rate-limiting middleware (`MyApp.Middleware.RateLimit`). |
| Memory | Each active loop holds the response state in memory. Monitor `open_responses_loop_iterations_total` to track concurrency. |
| Ollama | Run Ollama on the same host or a fast private network. GPU latency dominates. |

## Health check

The Phoenix endpoint is healthy when it responds to HTTP requests. Add a health check route:

```elixir
# router.ex
get "/health", OpenResponsesWeb.HealthController, :check
```

```elixir
defmodule OpenResponsesWeb.HealthController do
  use OpenResponsesWeb, :controller

  def check(conn, _params) do
    json(conn, %{status: "ok"})
  end
end
```