README.md

# CrucibleFeedback

<p align="center">
  <img src="assets/crucible_feedback.svg" alt="CrucibleFeedback Logo" width="200"/>
</p>

<p align="center">
  <strong>Production feedback loop for ML systems: telemetry ingestion, quality assessment, drift detection, curation, retraining triggers, and export</strong>
</p>

<p align="center">
  <a href="https://hex.pm/packages/crucible_feedback"><img src="https://img.shields.io/hexpm/v/crucible_feedback.svg" alt="Hex Version"/></a>
  <a href="https://hexdocs.pm/crucible_feedback"><img src="https://img.shields.io/badge/hex-docs-blue.svg" alt="Hex Docs"/></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"/></a>
</p>

---

## Features

- Batch-buffered ingestion with PII sanitization and telemetry events
- User signal capture (thumbs up/down, regenerate, edit, copy, share, report)
- Multi-check quality assessment (format, length, refusal, repetition, optional LLM-judge)
- Drift detection (statistical, embedding, output)
- Data curation strategies (high-quality, hard examples, diverse)
- Retraining triggers (drift, quality drop, count, schedule)
- Export to JSONL, HuggingFace dataset dir, Parquet placeholder, preference pairs
- Crucible stage integration for pipelines

## Installation

```elixir
# mix.exs
{:crucible_feedback, "~> 0.1.0"}
```

## Configuration

```elixir
# config/config.exs
config :crucible_feedback,
  ecto_repos: [CrucibleFeedback.Repo],
  storage: CrucibleFeedback.Storage.Ecto,
  embedding_client: CrucibleFeedback.EmbeddingClient.Noop,
  start_repo: true,
  start_ingestion: true,
  ingestion: [
    flush_interval: :timer.seconds(5),
    max_batch_size: 1_000
  ],
  sanitizer: [
    pii_patterns: []
  ],
  quality: [
    min_length: 20,
    max_length: 4_000
  ],
  triggers: [
    drift_threshold: 0.2,
    quality_threshold: 0.7,
    data_count_threshold: 1_000,
    schedule_interval_seconds: 86_400
  ],
  export: [
    output_dir: "exports"
  ],
  curation: [
    limit: 1_000,
    min_quality_score: 0.8,
    max_hard_quality: 0.5
  ]
```

## Usage

### Ingest Events

```elixir
CrucibleFeedback.log_inference(%{
  deployment_id: "deploy-1",
  model_version_id: "model-1",
  user_id: "user-123",
  prompt: "What is 2+2?",
  response: "4",
  latency_ms: 120,
  token_count: 42,
  metadata: %{source: "api"}
})
```

### Record Signals

```elixir
CrucibleFeedback.record_signal("inference-id", :thumbs_up, %{source: "ui"})
CrucibleFeedback.record_user_edit("inference-id", "User edited response")
```

### Quality Assessment

```elixir
CrucibleFeedback.assess_quality(%{response: "{\"ok\": true}"})
CrucibleFeedback.get_quality_stats("deploy-1")
```

### Drift Detection

```elixir
CrucibleFeedback.detect_drift("deploy-1", window_size: 1_000)
```

### Curation and Export

```elixir
CrucibleFeedback.curate("deploy-1", persist: true)
CrucibleFeedback.export("deploy-1", format: :jsonl, output_path: "exports/feedback.jsonl")
CrucibleFeedback.export_preference_pairs("deploy-1")
```

### Retraining Triggers

```elixir
CrucibleFeedback.check_triggers("deploy-1")
```

### Crucible Stages

```elixir
CrucibleFeedback.Stages.ExportFeedback.run(context, format: :jsonl)
CrucibleFeedback.Stages.CheckTriggers.run(context, [])
```

## Storage Backends

- `CrucibleFeedback.Storage.Ecto` (Postgres)
- `CrucibleFeedback.Storage.Memory` (tests/dev)
- `CrucibleFeedback.Storage.Clickhouse` (placeholder)

## Migrations

```bash
mix ecto.create
mix ecto.migrate
```

## Testing

```bash
mix test
```

## Notes

- Parquet export currently writes JSONL content to the specified `.parquet` path as a placeholder.
- To enable embedding drift and diversity selection, configure `embedding_client` with a real provider.