# Streaming
OpenResponses supports Server-Sent Events (SSE) streaming out of the box. When `"stream": true` is included in a request, the response is delivered as a sequence of events rather than a single JSON object.
## Enabling streaming
```json
{
"model": "gpt-4o",
"stream": true,
"input": [{"role": "user", "content": "Tell me about the BEAM."}]
}
```
The response is `Content-Type: text/event-stream` with one event per line pair.
## Event format
Each event follows the SSE format:
```
event: <event-type>
data: <json-payload>
```
Every event payload includes a `sequence_number` — a monotonically increasing integer you can use to detect gaps or re-order out-of-order events.
## Event catalogue
### Lifecycle events
| Event | When |
|---|---|
| `response.created` | Immediately after the request is accepted. Contains the initial response object. |
| `response.in_progress` | The provider has begun generating. |
| `response.completed` | All output items are complete and the response has reached a terminal state. Contains the final response object. |
| `response.failed` | An error occurred. Contains an `error` object. |
| `response.incomplete` | The token budget (`max_output_tokens`) was exhausted before the model finished. |
### Output item events
| Event | When |
|---|---|
| `response.output_item.added` | A new output item (message, function call, reasoning) begins. |
| `response.output_item.done` | An output item is complete. |
### Text delta events
| Event | When |
|---|---|
| `response.content_part.added` | A new content part within a message begins. |
| `response.output_text.delta` | A chunk of text from the model. The `delta` field contains the new text. |
| `response.output_text.done` | A text content part is complete. The `text` field contains the full assembled text. |
| `response.content_part.done` | A content part is complete. |
### Tool call events
| Event | When |
|---|---|
| `response.function_call_arguments.delta` | A chunk of JSON arguments for a function call. |
| `response.function_call_arguments.done` | The function call arguments are complete. |
## Sequence numbers
Every event includes `"sequence_number": N`. Numbers are assigned by the loop process and increment by one per event. You can use them to:
- Detect dropped events (gap in sequence)
- Re-order events if your client receives them out of order
- Resume a stream by requesting events after a known sequence number (future feature)
## A complete streaming session
```
event: response.created
data: {"id":"resp_01","object":"response","model":"gpt-4o","status":"queued","sequence_number":0}
event: response.in_progress
data: {"type":"response.in_progress","sequence_number":1}
event: response.output_item.added
data: {"type":"response.output_item.added","item":{"id":"msg_01","type":"message","role":"assistant","content":[],"status":"in_progress"},"sequence_number":2}
event: response.content_part.added
data: {"type":"response.content_part.added","item_id":"msg_01","part":{"type":"output_text","text":""},"sequence_number":3}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_01","delta":"The BEAM","sequence_number":4}
event: response.output_text.delta
data: {"type":"response.output_text.delta","item_id":"msg_01","delta":" is a virtual machine","sequence_number":5}
event: response.output_text.done
data: {"type":"response.output_text.done","item_id":"msg_01","text":"The BEAM is a virtual machine","sequence_number":6}
event: response.content_part.done
data: {"type":"response.content_part.done","item_id":"msg_01","sequence_number":7}
event: response.output_item.done
data: {"type":"response.output_item.done","item":{"id":"msg_01","status":"completed"},"sequence_number":8}
event: response.completed
data: {"id":"resp_01","status":"completed","output":[...],"sequence_number":9}
data: [DONE]
```
## Client examples
### JavaScript (browser)
```javascript
const response = await fetch('/v1/responses', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
model: 'gpt-4o',
stream: true,
input: [{role: 'user', content: 'Hello'}]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const {value, done} = await reader.read();
if (done) break;
buffer += decoder.decode(value, {stream: true});
const lines = buffer.split('\n\n');
buffer = lines.pop();
for (const chunk of lines) {
const dataLine = chunk.split('\n').find(l => l.startsWith('data: '));
if (!dataLine || dataLine === 'data: [DONE]') continue;
const event = JSON.parse(dataLine.slice(6));
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
}
}
```
### Elixir (server-to-server)
```elixir
{:ok, response} = Req.post("http://localhost:4000/v1/responses",
json: %{model: "gpt-4o", stream: true, input: [%{role: "user", content: "Hello"}]},
into: fn {:data, chunk}, acc ->
chunk
|> String.split("\n\n", trim: true)
|> Enum.each(fn event_str ->
case String.split(event_str, "data: ", parts: 2) do
[_, "[DONE]"] -> :ok
[_, json] ->
event = Jason.decode!(json)
if event["type"] == "response.output_text.delta" do
IO.write(event["delta"])
end
_ -> :ok
end
end)
{:cont, acc}
end
)
```
## Non-streaming mode
Without `"stream": true`, OpenResponses waits for the loop to complete and returns the full response object in one HTTP response. This is simpler for short interactions but adds latency for long generations.
```json
{
"model": "gpt-4o",
"input": [{"role": "user", "content": "What is 2+2?"}]
}
```
The default timeout is 30 seconds. Long-running loops should use streaming.