# LabelingIR
<p align="center"><img src="assets/labeling_ir.svg" alt="LabelingIR Logo" width="392" /></p>
[](https://hex.pm/packages/labeling_ir)
[](https://hexdocs.pm/labeling_ir)
[](https://hex.pm/packages/labeling_ir)
[](LICENSE)
[](https://github.com/North-Shore-AI/labeling_ir)
Shared intermediate representation (IR) structs for the North Shore labeling stack. LabelingIR provides typed, JSON-serializable data structures used across Forge, Anvil, Ingot, and external clients for human-in-the-loop ML workflows.
## Highlights
- **Typed structs** with enforced keys and sensible defaults
- **JSON-serializable** via Jason for API transport and storage
- **Multi-tenant** support with `tenant_id` and optional `namespace` on all entities
- **Lineage tracking** via optional `lineage_ref` for provenance
- **Zero dependencies** beyond Jason
## Installation
Add to your `mix.exs`:
```elixir
def deps do
[
{:labeling_ir, "~> 0.1.0"}
]
end
```
Or from GitHub:
```elixir
def deps do
[
{:labeling_ir, git: "https://github.com/North-Shore-AI/labeling_ir.git"}
]
end
```
## Structs
### Core Entities
| Struct | Purpose |
|--------|---------|
| `LabelingIR.Sample` | UI-friendly sample representation with payload and artifacts |
| `LabelingIR.Dataset` | Versioned dataset with slices for labeling/eval workloads |
| `LabelingIR.Schema` | Declarative label schema definition |
| `LabelingIR.Schema.Field` | Individual field in a schema (`:text`, `:select`, `:scale`, etc.) |
### Labeling Workflow
| Struct | Purpose |
|--------|---------|
| `LabelingIR.Assignment` | Unit of labeling work binding a sample to a schema |
| `LabelingIR.Label` | Human-provided label values with timing and metadata |
| `LabelingIR.EvalRun` | Evaluation run (human or model) over a dataset slice |
### Artifacts
| Struct | Purpose |
|--------|---------|
| `LabelingIR.Artifact` | Artifact attached to a sample (image, JSON, text, etc.) |
| `LabelingIR.ArtifactRef` | Lightweight reference to an artifact by ID |
## Usage
### Creating a Sample
```elixir
alias LabelingIR.{Sample, Artifact}
sample = %Sample{
id: "sample_001",
tenant_id: "acme_corp",
pipeline_id: "sentiment_v2",
payload: %{"text" => "Great product, highly recommend!"},
artifacts: [
%Artifact{
id: "img_001",
url: "https://storage.example.com/screenshot.png",
filename: "screenshot.png",
artifact_type: :image,
mime: "image/png"
}
],
metadata: %{"source" => "support_tickets"},
created_at: DateTime.utc_now()
}
```
### Defining a Schema
```elixir
alias LabelingIR.{Schema, Schema.Field}
schema = %Schema{
id: "sentiment_schema_v1",
tenant_id: "acme_corp",
fields: [
%Field{
name: "sentiment",
type: :select,
required: true,
options: ["positive", "negative", "neutral"]
},
%Field{
name: "confidence",
type: :scale,
required: true,
min: 1,
max: 5,
help: "How confident are you in this label?"
},
%Field{
name: "notes",
type: :text,
required: false
}
]
}
```
### Creating an Assignment
```elixir
alias LabelingIR.Assignment
assignment = %Assignment{
id: "assign_001",
queue_id: "sentiment_queue",
tenant_id: "acme_corp",
sample: sample,
schema: schema,
existing_labels: [],
expires_at: DateTime.add(DateTime.utc_now(), 3600, :second)
}
```
### Submitting a Label
```elixir
alias LabelingIR.Label
label = %Label{
id: "label_001",
assignment_id: "assign_001",
sample_id: "sample_001",
queue_id: "sentiment_queue",
tenant_id: "acme_corp",
user_id: "labeler_alice",
values: %{
"sentiment" => "positive",
"confidence" => 4,
"notes" => "Clear positive sentiment"
},
time_spent_ms: 12500,
created_at: DateTime.utc_now()
}
```
### Datasets and Eval Runs
```elixir
alias LabelingIR.{Dataset, EvalRun}
dataset = %Dataset{
id: "sentiment_dataset_v1",
tenant_id: "acme_corp",
version: "1.0.0",
slices: [
%{name: "train", sample_ids: ["s1", "s2", "s3"], filter: %{}},
%{name: "test", sample_ids: ["s4", "s5"], filter: %{}}
],
created_at: DateTime.utc_now()
}
eval_run = %EvalRun{
id: "eval_001",
tenant_id: "acme_corp",
dataset_id: "sentiment_dataset_v1",
slice: "test",
run_type: :model,
model_ref: "gpt-4-turbo",
metrics: %{
"accuracy" => 0.92,
"f1" => 0.89,
"cohens_kappa" => 0.85
},
created_at: DateTime.utc_now()
}
```
### JSON Serialization
All structs derive `Jason.Encoder` for seamless JSON serialization:
```elixir
Jason.encode!(sample)
# => {"id":"sample_001","tenant_id":"acme_corp",...}
```
## Field Types
The `Schema.Field` struct supports the following types:
| Type | Description |
|------|-------------|
| `:text` | Free-form text input |
| `:boolean` | True/false toggle |
| `:select` | Single selection from `options` list |
| `:multiselect` | Multiple selections from `options` list |
| `:scale` | Numeric scale with `min`/`max` bounds |
Custom types can be used as atoms (e.g., `:date`, `:datetime`, `:number`).
## Architecture
LabelingIR serves as the shared contract between:
- **Forge** - Sample ingestion and preprocessing pipelines
- **Anvil** - Labeling queue management and human workflows
- **Ingot** - UI frontend for labelers and reviewers
- **External clients** - API consumers and integrations
```
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Forge │────▶│ Anvil │────▶│ Ingot │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└───────────────┼───────────────┘
│
┌──────┴──────┐
│ LabelingIR │
│ (structs) │
└─────────────┘
```
## Development
```bash
# Install dependencies
mix deps.get
# Run tests
mix test
# Generate docs
mix docs
# Type checking
mix dialyzer
```
## License
MIT License - see [LICENSE](LICENSE) for details.
## Acknowledgments
Built by the North Shore AI team for the machine learning community.