
# CrucibleIR
[](https://hex.pm/packages/crucible_ir)
[](https://hexdocs.pm/crucible_ir)
[](LICENSE)
Intermediate Representation for the Crucible ML reliability ecosystem.
Full docs: https://hexdocs.pm/crucible_ir
## Overview
`CrucibleIR` provides shared data structures for defining ML reliability experiments across the Crucible ecosystem. It serves as the common language for experiment configuration, enabling consistency across all Crucible tools and components.
## Requirements
- Elixir `~> 1.14` (and matching Erlang/OTP)
- `jason` for JSON encoding (included in deps)
## Features
- **Experiment Definition**: Complete experiment specifications with backends, pipelines, and datasets
- **Reliability Configurations**: Ensemble voting, hedging, statistical testing, fairness, and guardrails
- **Type Safety**: Full type specifications for all structs
- **JSON Serialization**: All structs derive `Jason.Encoder` for easy serialization
- **Comprehensive Documentation**: 100% documentation coverage with examples
## Installation
Add `crucible_ir` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:crucible_ir, "~> 0.1.0"}
]
end
```
Fetch dependencies:
```bash
mix deps.get
```
## Quick Start
```elixir
alias CrucibleIR.{Experiment, BackendRef, StageDef, DatasetRef}
alias CrucibleIR.Reliability.{Config, Ensemble, Stats}
# Define a simple experiment
experiment = CrucibleIR.new_experiment(
id: :gpt4_benchmark,
backend: %BackendRef{id: :openai_gpt4},
pipeline: [
%StageDef{name: :preprocessing},
%StageDef{name: :inference},
%StageDef{name: :evaluation}
],
dataset: %DatasetRef{name: :mmlu, split: :test}
)
# Add reliability mechanisms
experiment = %{experiment |
reliability: %Config{
ensemble: %Ensemble{
strategy: :majority,
models: [:gpt4, :claude, :gemini],
execution_mode: :parallel
},
stats: %Stats{
tests: [:ttest, :bootstrap],
alpha: 0.05
}
}
}
# Serialize to JSON
{:ok, json} = Jason.encode(experiment)
```
## Usage Workflow
1. Define an `Experiment` with `id`, `backend`, and `pipeline` stages.
2. Add a `DatasetRef` if the experiment targets a dataset.
3. Attach `Reliability.Config` options (ensemble, hedging, stats, fairness, guardrails).
4. Add `OutputSpec` entries to describe where and how to emit results.
5. Serialize with `Jason.encode/1` to pass the IR into other Crucible services.
## Core Components
### Experiment Definition
- **`Experiment`** - Top-level experiment definition
- **`BackendRef`** - Reference to an LLM backend
- **`DatasetRef`** - Reference to a dataset
- **`StageDef`** - Processing stage definition
- **`OutputSpec`** - Output specification
### Reliability Mechanisms
- **`Reliability.Config`** - Container for all reliability configurations
- **`Reliability.Ensemble`** - Multi-model ensemble voting
- **`Reliability.Hedging`** - Request hedging for tail latency reduction
- **`Reliability.Stats`** - Statistical testing configuration
- **`Reliability.Fairness`** - Fairness and bias detection
- **`Reliability.Guardrail`** - Security guardrails (prompt injection, PII, etc.)
## Struct Field Reference
- **Experiment**: required `id`, `backend`, `pipeline`; optional `description`, `owner`, `tags`, `metadata`, `dataset`, `reliability`, `outputs`, `created_at`, `updated_at`.
- **BackendRef**: required `id`; optional `profile` (default `:default`), `options`.
- **DatasetRef**: required `name`; optional `provider` (default `:crucible_datasets`), `split` (default `:train`), `options`.
- **StageDef**: required `name`; optional `module`, `options`, `enabled` (default `true`).
- **OutputSpec**: required `name`; optional `formats` (default `[:markdown]`), `sink` (default `:file`), `options`.
- **Reliability.Config**: optional `ensemble`, `hedging`, `stats`, `fairness`, `guardrails`.
- **Ensemble**: `strategy` (default `:none`), `execution_mode` (default `:parallel`), `models`, `weights`, `min_agreement`, `timeout_ms`, `options`.
- **Hedging**: `strategy` (default `:off`), `delay_ms`, `percentile`, `max_hedges`, `budget_percent`, `options`.
- **Stats**: `tests` (default `[:ttest, :bootstrap]`), `alpha` (default `0.05`), `confidence_level`, `effect_size_type`, `multiple_testing_correction`, `bootstrap_iterations`, `options`.
- **Fairness**: `enabled` (default `false`), `metrics`, `group_by`, `threshold`, `fail_on_violation`, `options`.
- **Guardrail**: `profiles` (default `[:default]`), `prompt_injection_detection`, `jailbreak_detection`, `pii_detection`, `pii_redaction`, `content_moderation`, `fail_on_detection`, `options`.
## Examples
### Ensemble Voting Experiment
```elixir
experiment = CrucibleIR.new_experiment(
id: :ensemble_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
ensemble: %Ensemble{
strategy: :weighted,
models: [:gpt4, :claude, :gemini],
weights: %{gpt4: 0.5, claude: 0.3, gemini: 0.2},
execution_mode: :parallel
}
}
)
```
### Hedging for Low Latency
```elixir
experiment = CrucibleIR.new_experiment(
id: :low_latency_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
hedging: %Hedging{
strategy: :percentile,
percentile: 0.95,
max_hedges: 2,
budget_percent: 15
}
}
)
```
### Statistical Testing
```elixir
experiment = CrucibleIR.new_experiment(
id: :stats_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
dataset: %DatasetRef{name: :mmlu},
reliability: %Config{
stats: %Stats{
tests: [:ttest, :mannwhitney, :bootstrap],
alpha: 0.01,
effect_size_type: :cohens_d,
bootstrap_iterations: 10000
}
}
)
```
### Fairness Checking
```elixir
experiment = CrucibleIR.new_experiment(
id: :fairness_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
fairness: %Fairness{
enabled: true,
metrics: [:demographic_parity, :equalized_odds],
group_by: :gender,
threshold: 0.8,
fail_on_violation: true
}
}
)
```
### Security Guardrails
```elixir
experiment = CrucibleIR.new_experiment(
id: :secure_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
guardrails: %Guardrail{
profiles: [:strict],
prompt_injection_detection: true,
jailbreak_detection: true,
pii_detection: true,
pii_redaction: true,
fail_on_detection: true
}
}
)
```
## Architecture
CrucibleIR follows a hierarchical structure:
```
Experiment (top-level)
├── BackendRef (which LLM to use)
├── Pipeline (list of StageDef)
├── DatasetRef (what data to evaluate)
├── Reliability.Config
│ ├── Ensemble (multi-model voting)
│ ├── Hedging (latency optimization)
│ ├── Stats (statistical testing)
│ ├── Fairness (bias detection)
│ └── Guardrails (security)
└── Outputs (list of OutputSpec)
```
## Testing
All modules have comprehensive test coverage:
```bash
mix test
```
Current test stats: **78 tests, 0 failures** (3 doctests, 75 unit tests)
## Documentation
Generate HTML documentation:
```bash
mix docs
```
## Integration with Crucible Ecosystem
CrucibleIR is used by:
- **crucible_harness** - Experiment orchestration
- **crucible_ensemble** - Ensemble voting implementation
- **crucible_hedging** - Request hedging implementation
- **crucible_bench** - Statistical testing
- **crucible_telemetry** - Metrics and instrumentation
- **crucible_trace** - Causal transparency
## Design Principles
1. **Immutable Data Structures**: All structs are immutable
2. **Type Safety**: Full type specifications with `@type` and `@spec`
3. **JSON-First**: All structs support JSON serialization
4. **Documentation**: Every module and public function is documented
5. **Test Coverage**: High test coverage with property-based testing
## Contributing
This library is part of the North-Shore-AI organization. Contributions welcome!
## License
MIT License - See LICENSE file for details
## Links
- **GitHub**: https://github.com/North-Shore-AI/crucible_ir
- **Documentation**: https://hexdocs.pm/crucible_ir
- **Crucible Framework**: https://github.com/North-Shore-AI/crucible_framework