# Elixir Codex SDK - Project Goals and Design
## Overview
The Elixir Codex SDK is an idiomatic, production-ready wrapper around OpenAI's `codex-rs` CLI executable. This SDK brings the power of OpenAI's Codex agent—a sophisticated AI assistant capable of reasoning, code generation, file manipulation, command execution, and more—into the Elixir/OTP ecosystem.
## Project Goals
### Primary Objectives
1. **Complete Feature Parity**: Implement all functionality available in the official TypeScript SDK
2. **Idiomatic Elixir**: Leverage OTP principles, GenServers, and BEAM concurrency patterns
3. **Production Ready**: Robust error handling, supervision trees, telemetry integration
4. **Type Safety**: Comprehensive structs using TypedStruct for all events, items, and options
5. **Battle Tested**: Deterministic, async test suite using Supertester (zero `Process.sleep`)
6. **Developer Experience**: Clear APIs, comprehensive documentation, helpful examples
### Secondary Objectives
1. **Performance**: Efficient streaming with backpressure, minimal memory overhead
2. **Observability**: Telemetry events for monitoring and debugging
3. **Extensibility**: Clean abstractions for future enhancements
4. **Maintainability**: Well-documented code, consistent patterns, comprehensive tests
## Core Concepts
### The Codex Agent
Codex is an AI agent that can:
- Analyze and generate code across multiple languages
- Execute shell commands in a controlled sandbox
- Read, write, and modify files with precise diffs
- Search the web for up-to-date information
- Make calls to Model Context Protocol (MCP) tools
- Reason about complex problems and maintain task lists
- Produce structured JSON output conforming to schemas
### Threads and Turns
**Thread**: A persistent conversation session with the agent. Threads maintain context across multiple interactions and are stored in `~/.codex/sessions`.
**Turn**: A single request-response cycle within a thread. Each turn:
- Starts with a user prompt (input)
- Produces a stream of events as the agent works
- Completes with a final response and usage statistics
- May include multiple items (messages, commands, file changes, etc.)
### Items
Items are the atomic units of work in a thread. Each item represents a specific action or artifact:
- **AgentMessage**: Text or JSON response from the agent
- **Reasoning**: The agent's reasoning process summary
- **CommandExecution**: Shell command with status and output
- **FileChange**: File modifications (add, update, delete)
- **McpToolCall**: External tool invocation via MCP
- **WebSearch**: Web search query and results
- **TodoList**: Agent's running task list
- **Error**: Non-fatal error items
### Events
Events are emitted during turn execution to provide real-time updates:
#### Thread-Level Events
- `ThreadStarted`: New thread initialized with ID
- `TurnStarted`: Agent begins processing prompt
- `TurnCompleted`: Turn finished with usage stats
- `TurnFailed`: Turn encountered fatal error
#### Item-Level Events
- `ItemStarted`: New item added (typically in progress)
- `ItemUpdated`: Item state changed
- `ItemCompleted`: Item reached terminal state
## Module Structure
### Core Modules
#### `Codex`
The main entry point for the SDK.
**Responsibilities**:
- Create new threads
- Resume existing threads
- Manage global options (API key, base URL, codex path)
**Key Functions**:
```elixir
start_thread(codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
resume_thread(thread_id, codex_opts \\ %{}, thread_opts \\ %{}) :: {:ok, thread} | {:error, term}
```
#### `Codex.Thread`
Manages individual conversation threads and turn execution.
**Responsibilities**:
- Execute turns (blocking or streaming)
- Maintain thread state and ID
- Apply thread-level options (model, sandbox, working directory)
**Key Functions**:
```elixir
run(thread, input, turn_opts \\ %{}) :: {:ok, turn_result} | {:error, term}
run_streamed(thread, input, turn_opts \\ %{}) :: {:ok, stream} | {:error, term}
```
#### `Codex.Exec`
GenServer that manages the `codex-rs` OS process lifecycle.
**Responsibilities**:
- Spawn and manage codex-rs process via Port
- Handle JSONL stdin/stdout communication
- Parse events and forward to caller
- Clean up resources on exit or crash
**Key Behaviors**:
- One GenServer per turn execution
- Supervised process with proper cleanup
- Telemetry events for observability
### Type Modules
#### `Codex.Events`
Defines all event types using TypedStruct.
**Event Types**:
- `ThreadStarted`, `TurnStarted`, `TurnCompleted`, `TurnFailed`
- `ItemStarted`, `ItemUpdated`, `ItemCompleted`
- `ThreadError`
#### `Codex.Items`
Defines all item types and their status enums.
**Item Types**:
- `AgentMessage`, `Reasoning`, `CommandExecution`, `FileChange`
- `McpToolCall`, `WebSearch`, `TodoList`, `Error`
**Status Types**:
- `CommandExecutionStatus`: `:in_progress`, `:completed`, `:failed`
- `PatchApplyStatus`: `:completed`, `:failed`
- `McpToolCallStatus`: `:in_progress`, `:completed`, `:failed`
#### `Codex.Options`
Configuration structs for all levels.
**Structs**:
- `Codex.Options`: Global options (codex path, API key, base URL)
- `Codex.Thread.Options`: Thread options (model, sandbox, working directory)
- `Codex.Turn.Options`: Turn options (output schema)
### Utility Modules
#### `Codex.OutputSchemaFile`
Helper for managing JSON schema temporary files.
**Responsibilities**:
- Create temporary file with JSON schema
- Provide cleanup function
- Handle errors gracefully
## Architecture Patterns
### Process Model
```
┌─────────────┐
│ Client │
└──────┬──────┘
│ (synchronous API calls)
▼
┌─────────────────┐
│ Codex.Thread │ (stateful struct, holds thread_id and options)
└────────┬────────┘
│ (spawns)
▼
┌──────────────────┐
│ Codex.Exec │ (GenServer - one per turn)
│ (GenServer) │ - Manages codex-rs lifecycle
└────────┬─────────┘ - Parses JSONL events
│ (spawns) - Handles Port communication
▼
┌──────────────────┐
│ Port (stdin/ │ (IPC with codex-rs)
│ stdout) │ - JSONL over stdin
└────────┬─────────┘ - JSONL events from stdout
│
▼
┌──────────────────┐
│ codex-rs │ (OpenAI's Rust CLI)
│ (OS Process) │ - Manages OpenAI API calls
└──────────────────┘ - Executes commands/file ops
- Streams events
```
### Data Flow
1. **Client calls `Codex.Thread.run/3`**
- Thread module starts `Codex.Exec` GenServer
- Passes thread_id, options, and input
2. **Exec spawns codex-rs process**
- Constructs command line arguments
- Sets environment variables
- Opens Port with `:binary`, `:use_stdio`, `:exit_status`
3. **Exec sends input via stdin**
- Writes prompt to Port
- Closes stdin to signal end of input
4. **Exec receives JSONL events from stdout**
- Parses each line as JSON
- Converts to structured event structs
- Forwards events to caller
5. **Blocking mode (`run/3`)**
- Exec accumulates events
- Extracts final response and items
- Returns complete `TurnResult` when turn completes
6. **Streaming mode (`run_streamed/3`)**
- Exec yields events as they arrive
- Client processes events in real-time
- Stream completes when turn finishes
### Error Handling
#### Recoverable Errors
- Non-fatal errors become `ErrorItem` in thread
- Agent continues processing
- Turn completes normally
#### Fatal Errors
- `TurnFailed` event with error details
- Process exits gracefully
- Resources cleaned up
- Error propagated to client
#### Process Crashes
- GenServer supervision restarts failed processes
- Port monitors detect process termination
- Cleanup functions remove temporary files
- Telemetry events logged
### Streaming Strategy
**Streaming Pros**:
- Real-time updates for responsive UIs
- Process events as they arrive
- Lower memory footprint for long turns
**Streaming Cons**:
- More complex client code
- Requires event handling logic
**Blocking Pros**:
- Simple API for scripting
- No event handling needed
- Complete result in one call
**Blocking Cons**:
- Higher memory usage
- No progress visibility
- Longer wait times
**Implementation**:
- `run_streamed/3` returns a Stream/Enumerable
- `run/3` internally uses streaming but buffers results
- Both share same `Codex.Exec` implementation
## Feature Set
### Completed in TypeScript SDK (Target Parity)
1. **Core Operations**
- Start new threads
- Resume existing threads
- Execute turns (blocking and streaming)
- Handle all event types
- Parse all item types
2. **Configuration**
- Custom codex binary path
- API key and base URL override
- Model selection
- Sandbox modes (read-only, workspace-write, danger-full-access)
- Working directory control
- Git repo check bypass
3. **Structured Output**
- JSON schema support
- Temporary file management
- Schema validation
4. **Error Handling**
- Process spawn errors
- JSON parse errors
- Exit code handling
- stderr capture
### Additional Features for Elixir
1. **OTP Integration**
- GenServer-based process management
- Supervision tree support
- Proper resource cleanup
2. **Telemetry**
- Turn start/complete events
- Error events
- Performance metrics
3. **Type Safety**
- TypedStruct for all data types
- Compile-time type checking
- Documentation from types
4. **Testing**
- Supertester integration
- Mock GenServer implementation
- Deterministic async tests
- Chaos engineering tests
## Development Approach
### Test-Driven Development
1. **Write tests first**: Define expected behavior through tests
2. **Implement minimally**: Write just enough code to pass tests
3. **Refactor confidently**: Tests provide safety net
4. **Document through tests**: Tests serve as executable documentation
### Incremental Implementation
**Week 1**: Core types and module stubs
- Define all event/item structs
- Create module outlines
- Set up test infrastructure
**Week 2**: Exec GenServer implementation
- Port-based process management
- JSONL parsing
- Event forwarding
**Week 3**: Thread management
- Blocking turn execution
- Streaming turn execution
- Option handling
**Week 4**: Integration and polish
- End-to-end tests
- Documentation
- Examples
- CI/CD
### Quality Standards
1. **Code Coverage**: Target 95%+ line coverage
2. **Documentation**: All public functions have @doc
3. **Typespecs**: All public functions have @spec
4. **Dialyzer**: Zero warnings
5. **Credo**: All issues resolved
6. **Tests**: Zero flaky tests, all async
## Success Criteria
### Must Have (MVP)
- [ ] All TypeScript SDK features implemented
- [ ] All tests passing (unit, integration, property)
- [ ] Documentation complete (API docs, guides, examples)
- [ ] CI/CD pipeline green
- [ ] Published to Hex.pm
### Should Have (v1.0)
- [ ] Telemetry integration documented
- [ ] Supervision tree examples
- [ ] Performance benchmarks
- [ ] Chaos engineering tests
- [ ] Real-world examples
### Could Have (Future)
- [ ] Custom event handlers
- [ ] Persistent event logging
- [ ] WebSocket-based streaming
- [ ] Native NIF for performance
- [ ] Phoenix LiveView integration examples
## References
- [OpenAI Codex GitHub](https://github.com/openai/codex)
- [TypeScript SDK Source](https://github.com/openai/codex/tree/main/sdk/typescript)
- [Elixir Port Documentation](https://hexdocs.pm/elixir/Port.html)
- [GenServer Behavior](https://hexdocs.pm/elixir/GenServer.html)
- [Supertester Library](https://hex.pm/packages/supertester)
- [TypedStruct Library](https://hex.pm/packages/typed_struct)