docs/01.md

# Elixir Codex SDK - Project Goals and Design

## Overview

The Elixir Codex SDK is an idiomatic, production-ready wrapper around OpenAI's `codex-rs` CLI executable. This SDK brings the power of OpenAI's Codex agent—a sophisticated AI assistant capable of reasoning, code generation, file manipulation, command execution, and more—into the Elixir/OTP ecosystem.

## Project Goals

### Primary Objectives

1. **Complete Feature Parity**: Implement all functionality available in the official TypeScript SDK
2. **Idiomatic Elixir**: Leverage OTP principles, GenServers, and BEAM concurrency patterns
3. **Production Ready**: Robust error handling, supervision trees, telemetry integration
4. **Type Safety**: Comprehensive structs using TypedStruct for all events, items, and options
5. **Battle Tested**: Deterministic, async test suite using Supertester (zero `Process.sleep`)
6. **Developer Experience**: Clear APIs, comprehensive documentation, helpful examples

### Secondary Objectives

1. **Performance**: Efficient streaming with backpressure, minimal memory overhead
2. **Observability**: Telemetry events for monitoring and debugging
3. **Extensibility**: Clean abstractions for future enhancements
4. **Maintainability**: Well-documented code, consistent patterns, comprehensive tests

## Core Concepts

### The Codex Agent

Codex is an AI agent that can:

- Analyze and generate code across multiple languages
- Execute shell commands in a controlled sandbox
- Read, write, and modify files with precise diffs
- Search the web for up-to-date information
- Make calls to Model Context Protocol (MCP) tools
- Reason about complex problems and maintain task lists
- Produce structured JSON output conforming to schemas

### Threads and Turns

**Thread**: A persistent conversation session with the agent. Threads maintain context across multiple interactions and are stored in `~/.codex/sessions`.

**Turn**: A single request-response cycle within a thread. Each turn:
- Starts with a user prompt (input)
- Produces a stream of events as the agent works
- Completes with a final response and usage statistics
- May include multiple items (messages, commands, file changes, etc.)

### Items

Items are the atomic units of work in a thread. Each item represents a specific action or artifact:

- **AgentMessage**: Text or JSON response from the agent
- **Reasoning**: The agent's reasoning process summary
- **CommandExecution**: Shell command with status and output
- **FileChange**: File modifications (add, update, delete)
- **McpToolCall**: External tool invocation via MCP
- **WebSearch**: Web search query and results
- **TodoList**: Agent's running task list
- **Error**: Non-fatal error items

### Events

Events are emitted during turn execution to provide real-time updates:

#### Thread-Level Events
- `ThreadStarted`: New thread initialized with ID
- `TurnStarted`: Agent begins processing prompt
- `TurnCompleted`: Turn finished with usage stats
- `TurnFailed`: Turn encountered fatal error

#### Item-Level Events
- `ItemStarted`: New item added (typically in progress)
- `ItemUpdated`: Item state changed
- `ItemCompleted`: Item reached terminal state

## Module Structure

### Core Modules

#### `Codex`
The main entry point for the SDK.

**Responsibilities**:
- Create new threads
- Resume existing threads
- Manage global options (API key, base URL, codex path)

**Key Functions**:
```elixir
start_thread(codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
resume_thread(thread_id, codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
```

#### `Codex.Thread`
Manages individual conversation threads and turn execution.

**Responsibilities**:
- Execute turns (blocking or streaming)
- Maintain thread state and ID
- Apply thread-level options (model, sandbox, working directory)

**Key Functions**:
```elixir
run(thread, input, turn_opts \\ %{}) :: {:ok, turn_result} | {:error, term}
run_streamed(thread, input, turn_opts \\ %{}) :: {:ok, stream} | {:error, term}
```

#### `Codex.Exec`
GenServer that manages the `codex-rs` OS process lifecycle.

**Responsibilities**:
- Spawn and manage codex-rs process via Port
- Handle JSONL stdin/stdout communication
- Parse events and forward to caller
- Clean up resources on exit or crash

**Key Behaviors**:
- One GenServer per turn execution
- Supervised process with proper cleanup
- Telemetry events for observability

### Type Modules

#### `Codex.Events`
Defines all event types using TypedStruct.

**Event Types**:
- `ThreadStarted`, `TurnStarted`, `TurnCompleted`, `TurnFailed`
- `ItemStarted`, `ItemUpdated`, `ItemCompleted`
- `ThreadError`

#### `Codex.Items`
Defines all item types and their status enums.

**Item Types**:
- `AgentMessage`, `Reasoning`, `CommandExecution`, `FileChange`
- `McpToolCall`, `WebSearch`, `TodoList`, `Error`

**Status Types**:
- `CommandExecutionStatus`: `:in_progress`, `:completed`, `:failed`
- `PatchApplyStatus`: `:completed`, `:failed`
- `McpToolCallStatus`: `:in_progress`, `:completed`, `:failed`

#### `Codex.Options`
Configuration structs for all levels.

**Structs**:
- `Codex.Options`: Global options (codex path, API key, base URL)
- `Codex.Thread.Options`: Thread options (model, sandbox, working directory)
- `Codex.Turn.Options`: Turn options (output schema)

### Utility Modules

#### `Codex.OutputSchemaFile`
Helper for managing JSON schema temporary files.

**Responsibilities**:
- Create temporary file with JSON schema
- Provide cleanup function
- Handle errors gracefully

## Architecture Patterns

### Process Model

```
┌─────────────┐
│   Client    │
└──────┬──────┘
       │ (synchronous API calls)
       ▼
┌─────────────────┐
│ Codex.Thread    │  (stateful struct, holds thread_id and options)
└────────┬────────┘
         │ (spawns)
         ▼
┌──────────────────┐
│  Codex.Exec      │  (GenServer - one per turn)
│  (GenServer)     │  - Manages codex-rs lifecycle
└────────┬─────────┘  - Parses JSONL events
         │ (spawns)   - Handles Port communication
         ▼
┌──────────────────┐
│   Port (stdin/   │  (IPC with codex-rs)
│    stdout)       │  - JSONL over stdin
└────────┬─────────┘  - JSONL events from stdout
         │
         ▼
┌──────────────────┐
│   codex-rs       │  (OpenAI's Rust CLI)
│   (OS Process)   │  - Manages OpenAI API calls
└──────────────────┘  - Executes commands/file ops
                      - Streams events
```

### Data Flow

1. **Client calls `Codex.Thread.run/3`**
   - Thread module starts `Codex.Exec` GenServer
   - Passes thread_id, options, and input

2. **Exec spawns codex-rs process**
   - Constructs command line arguments
   - Sets environment variables
   - Opens Port with `:binary`, `:use_stdio`, `:exit_status`

3. **Exec sends input via stdin**
   - Writes prompt to Port
   - Closes stdin to signal end of input

4. **Exec receives JSONL events from stdout**
   - Parses each line as JSON
   - Converts to structured event structs
   - Forwards events to caller

5. **Blocking mode (`run/3`)**
   - Exec accumulates events
   - Extracts final response and items
   - Returns complete `TurnResult` when turn completes

6. **Streaming mode (`run_streamed/3`)**
   - Exec yields events as they arrive
   - Client processes events in real-time
   - Stream completes when turn finishes

### Error Handling

#### Recoverable Errors
- Non-fatal errors become `ErrorItem` in thread
- Agent continues processing
- Turn completes normally

#### Fatal Errors
- `TurnFailed` event with error details
- Process exits gracefully
- Resources cleaned up
- Error propagated to client

#### Process Crashes
- GenServer supervision restarts failed processes
- Port monitors detect process termination
- Cleanup functions remove temporary files
- Telemetry events logged

### Streaming Strategy

**Streaming Pros**:
- Real-time updates for responsive UIs
- Process events as they arrive
- Lower memory footprint for long turns

**Streaming Cons**:
- More complex client code
- Requires event handling logic

**Blocking Pros**:
- Simple API for scripting
- No event handling needed
- Complete result in one call

**Blocking Cons**:
- Higher memory usage
- No progress visibility
- Longer wait times

**Implementation**:
- `run_streamed/3` returns a Stream/Enumerable
- `run/3` internally uses streaming but buffers results
- Both share same `Codex.Exec` implementation

## Feature Set

### Completed in TypeScript SDK (Target Parity)

1. **Core Operations**
   - Start new threads
   - Resume existing threads
   - Execute turns (blocking and streaming)
   - Handle all event types
   - Parse all item types

2. **Configuration**
   - Custom codex binary path
   - API key and base URL override
   - Model selection
   - Sandbox modes (read-only, workspace-write, danger-full-access)
   - Working directory control
   - Git repo check bypass

3. **Structured Output**
   - JSON schema support
   - Temporary file management
   - Schema validation

4. **Error Handling**
   - Process spawn errors
   - JSON parse errors
   - Exit code handling
   - stderr capture

### Additional Features for Elixir

1. **OTP Integration**
   - GenServer-based process management
   - Supervision tree support
   - Proper resource cleanup

2. **Telemetry**
   - Turn start/complete events
   - Error events
   - Performance metrics

3. **Type Safety**
   - TypedStruct for all data types
   - Compile-time type checking
   - Documentation from types

4. **Testing**
   - Supertester integration
   - Mock GenServer implementation
   - Deterministic async tests
   - Chaos engineering tests

## Development Approach

### Test-Driven Development

1. **Write tests first**: Define expected behavior through tests
2. **Implement minimally**: Write just enough code to pass tests
3. **Refactor confidently**: Tests provide safety net
4. **Document through tests**: Tests serve as executable documentation

### Incremental Implementation

**Week 1**: Core types and module stubs
- Define all event/item structs
- Create module outlines
- Set up test infrastructure

**Week 2**: Exec GenServer implementation
- Port-based process management
- JSONL parsing
- Event forwarding

**Week 3**: Thread management
- Blocking turn execution
- Streaming turn execution
- Option handling

**Week 4**: Integration and polish
- End-to-end tests
- Documentation
- Examples
- CI/CD

### Quality Standards

1. **Code Coverage**: Target 95%+ line coverage
2. **Documentation**: All public functions have @doc
3. **Typespecs**: All public functions have @spec
4. **Dialyzer**: Zero warnings
5. **Credo**: All issues resolved
6. **Tests**: Zero flaky tests, all async

## Success Criteria

### Must Have (MVP)

- [ ] All TypeScript SDK features implemented
- [ ] All tests passing (unit, integration, property)
- [ ] Documentation complete (API docs, guides, examples)
- [ ] CI/CD pipeline green
- [ ] Published to Hex.pm

### Should Have (v1.0)

- [ ] Telemetry integration documented
- [ ] Supervision tree examples
- [ ] Performance benchmarks
- [ ] Chaos engineering tests
- [ ] Real-world examples

### Could Have (Future)

- [ ] Custom event handlers
- [ ] Persistent event logging
- [ ] WebSocket-based streaming
- [ ] Native NIF for performance
- [ ] Phoenix LiveView integration examples

## References

- [OpenAI Codex GitHub](https://github.com/openai/codex)
- [TypeScript SDK Source](https://github.com/openai/codex/tree/main/sdk/typescript)
- [Elixir Port Documentation](https://hexdocs.pm/elixir/Port.html)
- [GenServer Behavior](https://hexdocs.pm/elixir/GenServer.html)
- [Supertester Library](https://hex.pm/packages/supertester)
- [TypedStruct Library](https://hex.pm/packages/typed_struct)