guides/conversation_history.md

Select File:
guides/conversation_history.md

# Conversation History

OpenResponses maintains conversation history automatically using `previous_response_id`. You never need to replay prior messages — just reference the last response ID and OpenResponses reconstructs the full context.

## How it works

When a response completes, OpenResponses stores it in `ResponseCache` (backed by Cachex). On the next request, if `previous_response_id` is present, the loop loads the prior response and prepends its `input` and `output` to the new request's input before sending to the provider.

```
Request 2: previous_response_id = "resp_01"
                                  │
                    ┌─────────────┘
                    ▼
              ResponseCache.get("resp_01")
                    │
                    ▼
        prev.input + prev.output + new_input
                    │
                    ▼
             sent to provider
```

## Basic usage

```bash
# Turn 1
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "input": [{"role": "user", "content": "My favourite language is Elixir."}]
  }'
# → {"id": "resp_abc", "status": "completed", ...}

# Turn 2 — no history needed in the request
curl -X POST http://localhost:4000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "previous_response_id": "resp_abc",
    "input": [{"role": "user", "content": "What is my favourite language?"}]
  }'
# → Model knows it's Elixir
```

## Cache configuration

By default, responses are cached for 24 hours in memory. To change the TTL, responses are stored via `Cachex` which you can configure at startup:

```elixir
# application.ex
{Cachex, name: :response_cache, limit: 10_000}
```

For cross-node or cross-restart persistence (Phase 3), add `AshPostgres` as a data layer and responses will be stored durably.

## What gets cached

For each completed response, the cache stores:

- `id` — the response ID
- `model` — the model used
- `status` — terminal state (`completed`, `failed`, or `incomplete`)
- `input` — the original input sent by the client
- `output` — all output items produced by the model
- `usage` — token counts
- `created_at` — timestamp

Responses in `failed` or `incomplete` states are cached but their output may be partial.

## Chaining multiple turns

Each turn only needs to reference the immediately preceding response — not the entire chain. OpenResponses handles the reconstruction:

```
resp_001 ← resp_002 ← resp_003 ← resp_004 (current)
```

When processing `resp_004`, OpenResponses loads `resp_003` from cache. `resp_003`'s own context was already reconstructed when it was created, so its `input` field contains the full accumulated history up to that point.

## Branching conversations

Because `previous_response_id` is just a reference, you can branch at any point:

```
resp_001
  ├── resp_002a (branch A)
  │     └── resp_003a
  └── resp_002b (branch B)
        └── resp_003b
```

Both branches reference `resp_001` but diverge from there. This is useful for showing users alternative continuations or implementing undo.