README.md

<p align="center">
  <img src="assets/gepa_ex.svg" alt="GEPA Elixir Logo" width="200" height="200">
</p>

# GEPA for Elixir

[![Elixir](https://img.shields.io/badge/elixir-1.18.3-purple.svg)](https://elixir-lang.org)
[![OTP](https://img.shields.io/badge/otp-27.3.3-blue.svg)](https://www.erlang.org)
[![Tests](https://img.shields.io/badge/tests-218%2F218%20passing-brightgreen)]()
[![Coverage](https://img.shields.io/badge/coverage-75.4%25-brightgreen)]()
[![Phase 1](https://img.shields.io/badge/Phase%201-Complete-success)]()
[![Phase 2](https://img.shields.io/badge/Phase%202-Complete-success)]()
[![License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/nshkrdotcom/gepa_ex/blob/main/LICENSE)

An Elixir implementation of GEPA (Genetic-Pareto), a framework for optimizing text-based system components using LLM-based reflection and Pareto-efficient evolutionary search.

## About GEPA

GEPA optimizes arbitrary systems composed of text componentsβ€”like AI prompts, code snippets, or textual specsβ€”against any evaluation metric. It employs LLMs to reflect on system behavior, using feedback from execution traces to drive targeted improvements.

This is an Elixir port of the [Python GEPA library](https://github.com/gepa-ai/gepa), designed to leverage:
- πŸš€ **BEAM concurrency** for 5-10x evaluation speedup (coming in Phase 4)
- πŸ›‘οΈ **OTP supervision** for fault-tolerant external service integration
- πŸ”„ **Functional programming** for clean, testable code
- πŸ“Š **Telemetry** for comprehensive observability (coming in Phase 3)
- ✨ **Production LLMs** - OpenAI GPT-4o-mini & Google Gemini Flash Lite (`gemini-flash-lite-latest`)

## πŸŽ‰ Phase 1 Complete - Production Ready!

### Core Features (Current v0.1.0 Release)

**Optimization System:**
- βœ… `GEPA.optimize/1` - Public API (working!)
- βœ… `GEPA.Engine` - Full optimization loop with stop conditions
- βœ… `GEPA.Proposer.Reflective` - Mutation strategy
- βœ… `GEPA.State` - State management with automatic Pareto updates (96.5% coverage)
- βœ… `GEPA.Utils.Pareto` - Multi-objective optimization (93.5% coverage, property-verified)
- βœ… `GEPA.Result` - Result analysis (100% coverage)
- βœ… `GEPA.Adapters.Basic` - Q&A adapter (92.1% coverage)
- βœ… Stop conditions with budget control
- βœ… State persistence (save/load)
- βœ… End-to-end integration tested

### Phase 1 Additions - NEW! πŸŽ‰

**Production LLM Integration:**
- βœ… `GEPA.LLM` - Unified LLM behavior
- βœ… `GEPA.LLM.ReqLLM` - Production implementation via ReqLLM
  - OpenAI support (GPT-4o-mini default)
  - Google Gemini support (gemini-flash-lite-latest)
  - Error handling, retries, timeouts
  - Configurable via environment or runtime
- βœ… `GEPA.LLM.Mock` - Testing implementation with flexible responses

**Advanced Batch Sampling:**
- βœ… `GEPA.Strategies.BatchSampler.EpochShuffled` - Epoch-based training with shuffling
- βœ… Reproducible with seed control
- βœ… Better training dynamics than simple sampling

**Working Examples:**
- βœ… 4 .exs script examples (quick start, math, custom adapter, persistence)
- βœ… 3 Livebook notebooks (interactive learning)
- βœ… Comprehensive examples/README.md guide
- βœ… Livebook guide with visualizations

**Phase 2 Additions - NEW! πŸŽ‰**

**Merge Proposer:**
- βœ… `GEPA.Proposer.Merge` - Genealogy-based candidate merging
- βœ… `GEPA.Utils` - Pareto dominator detection (93.3% coverage)
- βœ… `GEPA.Proposer.MergeUtils` - Ancestry tracking (92.3% coverage)
- βœ… Engine integration with merge scheduling
- βœ… 44 comprehensive tests (34 unit + 10 properties)

**Incremental Evaluation:**
- βœ… `GEPA.Strategies.EvaluationPolicy.Incremental` - Progressive validation
- βœ… Configurable sample sizes and thresholds
- βœ… Reduces computation on large validation sets
- βœ… 12 tests

**Advanced Stop Conditions:**
- βœ… `GEPA.StopCondition.Timeout` - Time-based stopping
- βœ… `GEPA.StopCondition.NoImprovement` - Early stopping
- βœ… Flexible time units and patience settings
- βœ… 9 tests

**Test Quality:**
- 201 tests (185 unit + 16 properties + 1 doctest)
- 100% passing βœ…
- 75.4% coverage (excellent!)
- Property tests with 1,600+ runs
- Zero Dialyzer errors
- TDD methodology throughout

### What's Next? See the [Roadmap on GitHub](https://github.com/nshkrdotcom/gepa_ex/blob/main/docs/20251029/roadmap.md)

**βœ… Phase 1: Production Viability** - COMPLETE!
- βœ… Real LLM integration (OpenAI, Gemini)
- βœ… Quick start examples (4 scripts + 3 livebooks)
- βœ… EpochShuffledBatchSampler

**βœ… Phase 2: Core Completeness** - COMPLETE!
- βœ… Merge proposer (genealogy-based recombination)
- βœ… IncrementalEvaluationPolicy (progressive validation)
- βœ… Additional stop conditions (Timeout, NoImprovement)
- βœ… Engine integration for merge proposer

**Phase 3: Production Hardening** - 8-10 weeks
- πŸ“‘ Telemetry integration
- 🎨 Progress tracking
- πŸ›‘οΈ Robust error handling

**Phase 4: Ecosystem Expansion** - 12-14 weeks
- πŸ”Œ Additional adapters (Generic, RAG)
- πŸš€ Performance optimization (parallel evaluation)
- 🌟 Community infrastructure

See [Implementation Gap Analysis on GitHub](https://github.com/nshkrdotcom/gepa_ex/blob/main/docs/20251029/implementation_gap_analysis.md) for detailed comparison with Python GEPA.

## Quick Start

### With Mock LLM (No API Key Required)

```elixir
# Define training data
trainset = [
  %{input: "What is 2+2?", answer: "4"},
  %{input: "What is 3+3?", answer: "6"}
]

valset = [%{input: "What is 5+5?", answer: "10"}]

# Create adapter with mock LLM (for testing)
adapter = GEPA.Adapters.Basic.new(llm: GEPA.LLM.Mock.new())

# Run optimization
{:ok, result} = GEPA.optimize(
  seed_candidate: %{"instruction" => "You are a helpful assistant."},
  trainset: trainset,
  valset: valset,
  adapter: adapter,
  max_metric_calls: 50
)

# Access results
best_program = GEPA.Result.best_candidate(result)
best_score = GEPA.Result.best_score(result)

IO.puts("Best score: #{best_score}")
IO.puts("Iterations: #{result.i}")
```

### With Production LLMs (NEW!)

```elixir
# OpenAI (GPT-4o-mini) - Requires OPENAI_API_KEY
llm = GEPA.LLM.ReqLLM.new(provider: :openai)
adapter = GEPA.Adapters.Basic.new(llm: llm)

# Or Gemini (`gemini-flash-lite-latest`) - Requires GEMINI_API_KEY
llm = GEPA.LLM.ReqLLM.new(provider: :gemini)
adapter = GEPA.Adapters.Basic.new(llm: llm)

# Then run optimization as above
{:ok, result} = GEPA.optimize(
  seed_candidate: %{"instruction" => "..."},
  trainset: trainset,
  valset: valset,
  adapter: adapter,
  max_metric_calls: 50
)
```

See [Examples overview](examples/README.md) for complete working examples!

### Interactive Livebooks (NEW!)

For interactive learning and experimentation:

```bash
# Install Livebook
mix escript.install hex livebook

# Open a livebook
livebook server livebooks/01_quick_start.livemd
```

Available Livebooks:
- `01_quick_start.livemd` - Interactive introduction
- `02_advanced_optimization.livemd` - Parameter tuning and visualization
- `03_custom_adapter.livemd` - Build adapters interactively

See [livebooks/README.md](livebooks/README.md) for details!

### With State Persistence

```elixir
{:ok, result} = GEPA.optimize(
  seed_candidate: seed,
  trainset: trainset,
  valset: valset,
  adapter: GEPA.Adapters.Basic.new(),
  max_metric_calls: 100,
  run_dir: "./my_optimization"  # State saved here, can resume
)
```

## Development

```bash
# Get dependencies
mix deps.get

# Run tests
mix test

# Run with coverage
mix test --cover

# Run specific tests
mix test test/gepa/utils/pareto_test.exs

# Format code
mix format

# Type checking
mix dialyzer
```

## Architecture

Based on behavior-driven design with functional core:

```
GEPA.optimize/1
  ↓
GEPA.Engine ← Behaviors β†’ User Implementations
  β”œβ”€β†’ Adapter (evaluate, reflect, propose)
  β”œβ”€β†’ Proposer (reflective, merge)
  β”œβ”€β†’ Strategies (selection, sampling, evaluation)
  └─→ StopCondition (budget, time, threshold)
```

## Documentation

### Current Status & Planning
- [Implementation Gap Analysis](https://github.com/nshkrdotcom/gepa_ex/blob/main/docs/20251029/implementation_gap_analysis.md) - Detailed comparison with Python GEPA (~60% complete, hosted on GitHub)
- [Development Roadmap](https://github.com/nshkrdotcom/gepa_ex/blob/main/docs/20251029/roadmap.md) - Path to production release (12-14 weeks, hosted on GitHub)

### Technical Documentation
- [Complete Integration Guide](https://github.com/nshkrdotcom/gepa_ex/blob/main/docs/20250829/00_complete_integration_guide.md)
- [Technical Design](docs/TECHNICAL_DESIGN.md)
- [Implementation Status](docs/IMPLEMENTATION_STATUS.md)
- [LLM Adapter Design](docs/llm_adapter_design.md) - Design for real LLM integration
- Component analysis (6 detailed documents in the [`docs/20250829/` directory](https://github.com/nshkrdotcom/gepa_ex/tree/main/docs/20250829/))

## Related Projects

- [GEPA Python](https://github.com/gepa-ai/gepa) - Original implementation
- [GEPA Paper](https://arxiv.org/abs/2507.19457) - Research paper

## License

[MIT License](LICENSE)