README.md

Select File:
# DSPex

<p align="center">
  <img src="assets/dspex-logo.svg" alt="DSPex Logo" width="200" height="200">
</p>

DSPex is a native Elixir implementation of [DSPy](https://github.com/stanfordnlp/dspy) (Declarative Self-improving Language Programs) that provides a unified interface for working with Large Language Models. It combines high-performance native Elixir implementations with Python DSPy integration through [Snakepit](https://github.com/nshkrdotcom/snakepit) for complex ML tasks.

## Features

- 🚀 **Hybrid Architecture**: Native Elixir for performance-critical operations, Python for complex ML
- 🔌 **Multiple LLM Adapters**: Gemini, InstructorLite, HTTP, Python bridge, and mock adapters
- 🎯 **DSPy Core Features**: Signatures, Predict, Chain of Thought, ReAct, and more
- 🔄 **Pipeline Composition**: Build complex workflows with sequential, parallel, and conditional execution
- 📊 **Smart Routing**: Automatically chooses the best implementation (native vs Python)
- 🏃 **Streaming Support**: Real-time streaming for supported providers (e.g., Gemini)

## DSPy Integration

DSPex provides comprehensive wrappers for all DSPy modules through Snakepit. See [DSPy Integration Guide](./README_DSPY_INTEGRATION.md) for details on:

- All available DSPy modules (Predict, ChainOfThought, ReAct, etc.)
- Optimizers (BootstrapFewShot, MIPRO, COPRO, etc.)  
- Retrievers (ColBERTv2, 20+ vector databases)
- Complete examples and usage patterns

## Installation

Add `dspex` to your list of dependencies in `mix.exs`:

```elixir
def deps do
  [
    {:dspex, "~> 0.1.1"}
  ]
end
```

## Quick Start

### Basic LLM Interaction

```elixir
# Configure a client (Gemini 2.0 Flash recommended - fast and free tier)
{:ok, client} = DSPex.lm_client(
  adapter: :gemini,
  api_key: System.get_env("GOOGLE_API_KEY") || System.get_env("GEMINI_API_KEY"),
  model: "gemini-2.0-flash-exp"
)

# Generate a response
{:ok, response} = DSPex.lm_generate(client, "What is Elixir?")
IO.puts(response)
```

### DSPy Operations

```elixir
# Parse a signature
{:ok, signature} = DSPex.signature("question: str -> answer: str")

# Basic prediction
{:ok, result} = DSPex.predict(signature, %{question: "What is DSPy?"})

# Chain of thought reasoning
{:ok, cot_result} = DSPex.chain_of_thought(
  signature, 
  %{question: "Explain quantum computing step by step"}
)
```

### Pipeline Composition

```elixir
# Define a complex pipeline mixing native and Python operations
pipeline = DSPex.pipeline([
  {:native, Signature, spec: "query -> keywords: list[str]"},
  {:python, "dspy.ChainOfThought", signature: "keywords -> analysis"},
  {:parallel, [
    {:native, Search, index: "docs"},
    {:python, "dspy.ColBERTv2", k: 10}
  ]},
  {:native, Template, template: "Results: <%= @results %>"}
])

# Execute the pipeline
{:ok, result} = DSPex.run_pipeline(pipeline, %{query: "machine learning trends"})
```

## LLM Adapters

DSPex provides multiple LLM adapters for different use cases:

### Gemini Adapter
Native Google Gemini API integration with streaming support:

```elixir
{:ok, client} = DSPex.lm_client(
  adapter: :gemini,
  api_key: System.get_env("GOOGLE_API_KEY") || System.get_env("GEMINI_API_KEY"),
  model: "gemini-2.0-flash-exp",
  generation_config: %{
    temperature: 0.7,
    max_output_tokens: 1000
  }
)

# Streaming responses
{:ok, stream} = DSPex.lm_generate(client, "Write a story", stream: true)
stream |> Enum.each(&IO.write/1)
```

### InstructorLite Adapter
For structured output with Ecto schema validation:

```elixir
defmodule Person do
  use Ecto.Schema
  
  embedded_schema do
    field :name, :string
    field :age, :integer
    field :occupation, :string
  end
end

{:ok, client} = DSPex.lm_client(
  adapter: :instructor_lite,
  provider: :gemini,
  api_key: System.get_env("GOOGLE_API_KEY") || System.get_env("GEMINI_API_KEY"),
  model: "gemini-2.0-flash-exp"
)

{:ok, person} = DSPex.lm_generate(
  client, 
  "Extract: John Doe is a 30-year-old software engineer",
  response_model: Person
)
```

### HTTP Adapter
Generic adapter for any HTTP-based LLM API:

```elixir
{:ok, client} = DSPex.lm_client(
  adapter: :http,
  base_url: "https://api.example.com",
  api_key: System.get_env("API_KEY"),
  model: "custom-model"
)
```

## Examples

The `examples/` directory contains comprehensive examples demonstrating DSPex capabilities:

### DSPy Integration Examples (`examples/dspy/`)

- **00_dspy_mock_demo.exs** - Basic test to verify DSPy integration is working
- **01_question_answering_pipeline.exs** - Core DSPy modules: Predict, ChainOfThought, optimization
- **02_code_generation_system.exs** - Advanced reasoning with ProgramOfThought, ReAct, and Retry
- **03_document_analysis_rag.exs** - Retrieval-augmented generation with ColBERTv2 and vector databases
- **04_optimization_showcase.exs** - All DSPy optimizers and advanced features

### Other Examples

- **advanced_signature_example.exs** - Complex business scenarios:
  - Document intelligence and analysis
  - Customer support automation
  - Financial risk assessment
  - Product recommendation systems

Run examples with any LLM provider:
```bash
# With Gemini (recommended - fast and free tier)
export GOOGLE_API_KEY=your-gemini-api-key
mix run examples/dspy/01_question_answering_pipeline.exs

# With OpenAI
export OPENAI_API_KEY=your-openai-api-key
# Then update the example's LM configuration

# With Anthropic, Cohere, or any other provider
# Set the appropriate API key and update the example's configuration
```

**Note**: DSPy examples default to Gemini 2.0 Flash for its speed and free tier, but work with any supported LLM provider.

## Architecture

DSPex uses a hybrid architecture that combines the best of both worlds:

```
User Request
    ↓
DSPex API
    ↓
Router (decides native vs Python)
    ↓
Native Module ←→ Python Bridge
                      ↓
                  Snakepit
                      ↓
                  Python DSPy
```

### Core Components

- **DSPex** - Clean public API
- **DSPex.Router** - Smart routing between native and Python implementations
- **DSPex.Pipeline** - Workflow orchestration
- **DSPex.Native.\*** - Native Elixir implementations (Signature, Template, Validator)
- **DSPex.Python.\*** - Python bridge via Snakepit
- **DSPex.LLM.\*** - LLM adapter system

## Configuration

```elixir
# config/config.exs
config :dspex,
  router: [
    prefer_native: true,
    fallback_to_python: true
  ]

config :snakepit,
  python_path: "python3",
  pool_size: 4
```

## Testing

DSPex uses a three-layer testing architecture:

```bash
# Run all tests
mix test

# Run specific test layers
mix test.fast        # Layer 1: Mock adapter tests (~70ms)
mix test.protocol    # Layer 2: Protocol tests
mix test.integration # Layer 3: Full integration tests
```

## Development

```bash
# Interactive shell
iex -S mix

# Code quality tools
mix format           # Format code
mix credo            # Static analysis
mix dialyzer         # Type checking
```

## Known Issues

1. **InstructorLite + Gemini**: InstructorLite generates JSON schemas with `additionalProperties` that Gemini doesn't accept. Use the native Gemini adapter for Gemini models.

2. **Python Environment**: Python with DSPy is required for Python bridge features. See [Snakepit setup instructions](https://github.com/nshkrdotcom/snakepit) for Python environment configuration.

## Roadmap

- [ ] Complete Python DSPy integration
- [ ] Additional native module implementations
- [ ] Distributed execution support
- [ ] Model management and optimization features
- [ ] Comprehensive documentation
- [ ] Performance benchmarks

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT

## Acknowledgments

- [DSPy](https://github.com/stanfordnlp/dspy) - The original Python implementation
- [Snakepit](https://github.com/nshkrdotcom/snakepit) - Python integration for Elixir
- [InstructorLite](https://github.com/martosaur/instructor_lite) - Structured output library