<div align="center"><img src="assets/crucible_framework.svg" width="400" alt="Crucible Framework Logo" /></div>
# CrucibleFramework
**A reliability-first experiment engine for LLM training and evaluation**
[](https://elixir-lang.org)
[](https://www.erlang.org)
[](https://hex.pm/packages/crucible_framework)
[](https://hexdocs.pm/crucible_framework)
[](https://github.com/North-Shore-AI/crucible_framework/blob/main/LICENSE)
---
## What’s New (v0.3.0 · 2025-11-23)
- Declarative Experiment IR (`Crucible.IR.*`) that fully describes datasets, stages, backends, and outputs.
- Stage-based pipeline runner (`Crucible.Pipeline.Runner`) with built-in stages: data_load → data_checks → guardrails → backend_call → cns_metrics → bench → report.
- Backend behaviour plus a mockable Tinkex implementation for LoRA training and sampling.
- Persistence layer (Ecto/Postgres) for experiments, runs, and artifacts; one-step bootstrap script (`scripts/setup_db.sh`).
- Live Tinkex demo pipeline (`examples/tinkex_live.exs`) wired to the new IR and stages.
---
## Quick Start
### Prerequisites
- Elixir ≥ 1.14 / OTP ≥ 25
- Local PostgreSQL (listening on `localhost:5432`)
- (Optional) `TINKER_API_KEY` for live Tinkex runs
### Install from Hex
```elixir
def deps do
[
{:crucible_framework, "~> 0.3.0"}
]
end
```
### 1) Bootstrap the database (dev + test)
```bash
./scripts/setup_db.sh
```
Creates the `crucible_dev` role, dev/test databases, runs migrations, and aligns with the baked-in configs (`config/dev.exs`, `config/test.exs`). No env vars required for DB access.
### 2) Run the suite
```bash
mix test # unit suite
MIX_ENV=test mix test --include integration # includes persistence tests
```
### 3) Run the live Tinkex demo
```bash
export TINKER_API_KEY=your_key
mix run examples/tinkex_live.exs
```
This executes a tiny SciFact-style pipeline through the stage engine, trains via Tinkex, samples a prompt, and emits a report to stdout + `reports/`.
---
## Core Concepts
### Experiment IR
Experiments are pure structs—serializable, inspectable, and backend-agnostic:
```elixir
alias Crucible.IR.{Experiment, DatasetRef, BackendRef, StageDef, ReliabilityConfig,
EnsembleConfig, HedgingConfig, GuardrailConfig, StatsConfig, FairnessConfig, OutputSpec}
experiment = %Experiment{
id: "tinkex_scifact_demo",
description: "Minimal Tinkex training pipeline",
dataset: %DatasetRef{name: "scifact_claims", options: %{path: "priv/data/scifact_claim_extractor_clean.jsonl", limit: 4, batch_size: 2}},
pipeline: [
%StageDef{name: :data_load, options: %{input_key: :prompt, output_key: :completion}},
%StageDef{name: :data_checks, options: %{required_fields: [:input, :output]}},
%StageDef{name: :guardrails},
%StageDef{name: :backend_call, options: %{mode: :train, sample_prompts: ["Write a counterclaim."], create_sampler?: true}},
%StageDef{name: :cns_metrics},
%StageDef{name: :bench},
%StageDef{name: :report, options: %{sink: :stdout, formats: [:markdown]}}
],
backend: %BackendRef{id: :tinkex, profile: :lora_finetune, options: %{base_model: "meta-llama/Llama-3.2-1B"}},
reliability: %ReliabilityConfig{
ensemble: %EnsembleConfig{strategy: :none},
hedging: %HedgingConfig{strategy: :off},
guardrails: %GuardrailConfig{profiles: [:default]},
stats: %StatsConfig{tests: [:bootstrap]},
fairness: %FairnessConfig{enabled: false}
},
outputs: [
%OutputSpec{name: :report, formats: [:markdown, :json], sink: :file, options: %{path: "reports/demo.md"}}
]
}
```
### Pipeline Engine
`Crucible.Pipeline.Runner` walks the `pipeline` list and calls each stage module. Built-in stages:
- `DataLoad`: streams and batches data (in-memory/JSONL helpers included)
- `DataChecks`: basic schema checks or pluggable validators
- `Guardrails`: adapter-based safety scanning (default no-op)
- `BackendCall`: training/sampling against a configured backend
- `CNSMetrics`: optional CNS adapter hook
- `Bench`: placeholder for statistical testing hooks (crucible_bench integration point)
- `Report`: renders Markdown/JSON, writes artifacts, and attaches to the run record
### Backends
- Behaviour: `Crucible.Backend` defines `init/start_session/train_step/save_checkpoint/create_sampler/sample`.
- Implementation: `Crucible.Backend.Tinkex` delegates to the `tinkex` SDK via a mockable client (`LiveClient` for production, `ClientMock` for tests).
### Persistence
- Repo: `CrucibleFramework.Repo` (Postgres)
- Schemas: experiments, runs, artifacts (`lib/crucible_framework/persistence/*.ex`)
- Helpers: `CrucibleFramework.Persistence.start_run/finish_run/record_artifact`
- Toggle via `:enable_repo` (enabled by default)
### Safety & Evaluation Adapters
- Guardrails: plug your adapter via `config :crucible_framework, :guardrail_adapter, YourModule` (default no-op).
- CNS: plug your adapter via `config :crucible_framework, :cns_adapter, YourModule` (default no-op).
---
## Running Your Own Pipeline
```elixir
{:ok, ctx} = CrucibleFramework.run(experiment, persist: true)
IO.inspect(ctx.metrics, label: "metrics")
```
Persistence will store the experiment, the run record, and any file artifacts emitted by `Report`.
---
## Development Notes
- DB credentials are baked into `config/dev.exs` and `config/test.exs` for frictionless local work (`crucible_dev` / `crucible_dev_pw`).
- `scripts/setup_db.sh` is idempotent—safe to rerun anytime.
- For live Tinkex calls, set `TINKER_API_KEY` (see `config/runtime.exs`).
---
## License
MIT. See [LICENSE](LICENSE).