docs/ZERO_SHOT.md

Select File:
# Zero-shot Classification Guide

Complete guide to zero-shot text classification in Nasty using Natural Language Inference models.

## Overview

Zero-shot classification allows you to classify text into **arbitrary categories without any training data**. It works by framing classification as a Natural Language Inference (NLI) problem.

**Key Benefits:**
- No training data required
- Works with any label set you define
- Add new categories instantly
- Multi-label classification support
- 70-85% accuracy on many tasks

## How It Works

The model treats classification as textual entailment:

1. **Hypothesis**: "This text is about {label}"
2. **Premise**: Your input text
3. **Prediction**: Probability that premise entails hypothesis

For each candidate label, the model predicts entailment probability. The label with highest probability wins.

### Example

**Text**: "I love this product!"

**Labels**: positive, negative, neutral

**Process**:
- "I love this product!" entails "This text is about positive" → 95%
- "I love this product!" entails "This text is about negative" → 2%
- "I love this product!" entails "This text is about neutral" → 3%

**Result**: positive (95% confidence)

## Quick Start

### CLI Usage

```bash
# Single text classification
mix nasty.zero_shot \
  --text "I love this product!" \
  --labels positive,negative,neutral

# Output:
# Text: I love this product!
#   Predicted: positive
#   Confidence: 95.3%
#
#   All scores:
#     positive: 95.3% ████████████████████
#     neutral:   3.2% █
#     negative:  1.5%
```

### Programmatic Usage

```elixir
alias Nasty.Statistics.Neural.Transformers.ZeroShot

{:ok, result} = ZeroShot.classify("I love this product!",
  candidate_labels: ["positive", "negative", "neutral"]
)

# result = %{
#   label: "positive",
#   scores: %{
#     "positive" => 0.953,
#     "neutral" => 0.032,
#     "negative" => 0.015
#   },
#   sequence: "I love this product!"
# }
```

## Common Use Cases

### 1. Sentiment Analysis

```bash
mix nasty.zero_shot \
  --text "The movie was boring and predictable" \
  --labels positive,negative,neutral
```

**Why it works**: Clear emotional content maps well to sentiment labels.

### 2. Topic Classification

```bash
mix nasty.zero_shot \
  --text "Bitcoin reaches new all-time high" \
  --labels technology,finance,sports,politics,business
```

**Why it works**: Topics have distinct semantic spaces.

### 3. Intent Detection

```bash
mix nasty.zero_shot \
  --text "Can you help me reset my password?" \
  --labels question,request,complaint,praise
```

**Why it works**: Intents have characteristic linguistic patterns.

### 4. Content Moderation

```bash
mix nasty.zero_shot \
  --text "This is the worst service ever!!!" \
  --labels spam,offensive,normal,promotional
```

**Why it works**: Moderation categories have clear signals.

### 5. Email Routing

```bash
mix nasty.zero_shot \
  --text "Urgent: Server down in production" \
  --labels urgent,normal,low_priority,informational
```

**Why it works**: Urgency and importance have lexical markers.

## Multi-label Classification

Assign multiple labels when appropriate:

```bash
mix nasty.zero_shot \
  --text "Urgent: Please review the attached technical document" \
  --labels urgent,action_required,informational,technical \
  --multi-label \
  --threshold 0.5
```

**Output**:
```
Predicted labels: urgent, action_required, technical

All scores:
  [✓] urgent:          0.89
  [✓] action_required: 0.76
  [✓] technical:       0.68
  [ ] informational:   0.34
```

Only labels above threshold (0.5) are selected.

### Multi-label Use Cases

- **Document tagging**: Tag with multiple topics
- **Email categorization**: Both "urgent" AND "technical"
- **Content flags**: Multiple moderation issues
- **Skill extraction**: Multiple skills from job description

## Batch Classification

Process multiple texts efficiently:

```bash
# Create input file
cat > texts.txt << EOF
I love this product!
The service was terrible
It's okay, nothing special
EOF

# Classify batch
mix nasty.zero_shot \
  --input texts.txt \
  --labels positive,negative,neutral \
  --output results.json
```

Result saved to `results.json`:
```json
[
  {
    "text": "I love this product!",
    "result": {
      "label": "positive",
      "scores": {"positive": 0.95, "neutral": 0.03, "negative": 0.02}
    },
    "success": true
  },
  ...
]
```

## Supported Models

### RoBERTa-MNLI (Default)

**Best for**: English text, highest accuracy

```bash
--model roberta_large_mnli
```

**Specs**:
- Parameters: 355M
- Languages: English only
- Accuracy: 85-90% on many tasks
- Speed: Medium

### BART-MNLI

**Best for**: Alternative to RoBERTa, slightly different strengths

```bash
--model bart_large_mnli
```

**Specs**:
- Parameters: 400M
- Languages: English only
- Accuracy: 83-88%
- Speed: Slower than RoBERTa

### XLM-RoBERTa

**Best for**: Multilingual (Spanish, Catalan, etc.)

```bash
--model xlm_roberta_base
```

**Specs**:
- Parameters: 270M
- Languages: 100 languages
- Accuracy: 75-85% (varies by language)
- Speed: Fast

## Custom Hypothesis Templates

Change how classification is framed:

```bash
# Default template
--hypothesis-template "This text is about {}"

# Custom templates
--hypothesis-template "This message is {}"
--hypothesis-template "The sentiment is {}"
--hypothesis-template "The topic of this text is {}"
--hypothesis-template "This document contains {}"
```

**Example**:

```bash
mix nasty.zero_shot \
  --text "Please call me back ASAP" \
  --labels urgent,normal,low_priority \
  --hypothesis-template "This message is {}"
```

Generates hypotheses:
- "This message is urgent"
- "This message is normal"
- "This message is low_priority"

## Best Practices

### 1. Choose Clear, Distinct Labels

**Good**:
```bash
--labels positive,negative,neutral
--labels urgent,normal,low_priority
--labels technical,business,personal
```

**Bad** (too similar):
```bash
--labels happy,joyful,cheerful  # Too similar!
--labels important,critical,essential  # Overlapping!
```

### 2. Use Descriptive Label Names

**Good**:
```bash
--labels positive_sentiment,negative_sentiment,neutral_sentiment
```

**Better**:
```bash
--labels positive,negative,neutral  # Simpler, but clear
```

**Bad**:
```bash
--labels pos,neg,neu  # Too cryptic
--labels 1,2,3  # Meaningless
```

### 3. Provide 2-6 Labels

- **Too few** (1 label): Not classification
- **Sweet spot** (2-6 labels): Best accuracy
- **Too many** (10+ labels): Accuracy degrades

### 4. Use Multi-label for Overlapping Concepts

**Single-label** (mutually exclusive):
```bash
--labels positive,negative,neutral
```

**Multi-label** (can overlap):
```bash
--labels urgent,technical,action_required,informational \
--multi-label
```

### 5. Adjust Threshold for Multi-label

```bash
# Conservative (fewer labels)
--threshold 0.7

# Balanced (default)
--threshold 0.5

# Liberal (more labels)
--threshold 0.3
```

## Performance Tips

### When Zero-shot Works Best

✓ Clear semantic categories  
✓ 2-6 distinct labels  
✓ Labels have characteristic language patterns  
✓ English text (for RoBERTa-MNLI)  
✓ Medium-length text (10-200 words)

### When to Use Fine-tuning Instead

✗ Need >90% accuracy  
✗ Domain-specific jargon  
✗ Subtle distinctions between labels  
✗ Have 1000+ labeled examples  
✗ Production critical system

Zero-shot is great for prototyping and low-stakes classification. For production, consider fine-tuning.

## Limitations

### 1. Language Dependence

RoBERTa-MNLI only works well for English. For other languages:

```bash
# Spanish/Catalan
--model xlm_roberta_base
```

Expect 10-15% lower accuracy than English.

### 2. Accuracy Ceiling

Zero-shot typically achieves 70-85% accuracy. Fine-tuning can reach 95-99%.

### 3. Context Window

Models have maximum input length (~512 tokens). Long documents need truncation:

```bash
# Truncate to first 512 tokens automatically
--max-length 512
```

### 4. Label Sensitivity

Results can vary with label phrasing:

```bash
# These may give different results:
--labels positive,negative
--labels good,bad
--labels happy,sad
```

Test different phrasings to find what works best.

## Troubleshooting

### All Scores Are Similar

**Problem**: Scores like 0.33, 0.34, 0.33 (no clear winner)

**Causes**:
- Labels are too similar
- Text is ambiguous
- Poor hypothesis template

**Solutions**:
1. Use more distinct labels
2. Try different hypothesis template
3. Add more context to text
4. Consider if text is truly ambiguous

### Wrong Label Predicted

**Problem**: Clearly wrong prediction

**Causes**:
- Label phrasing doesn't match text semantics
- Need different hypothesis template
- Text is out-of-domain for model

**Solutions**:
1. Rephrase labels
2. Change hypothesis template
3. Try different model
4. Consider fine-tuning for your domain

### Slow Performance

**Problem**: Classification takes too long

**Solutions**:
1. Use smaller model (xlm_roberta_base vs roberta_large)
2. Enable GPU (set XLA_TARGET=cuda)
3. Reduce number of labels
4. Use batch processing for multiple texts

## Advanced Usage

### Programmatic Batch Processing

```elixir
alias Nasty.Statistics.Neural.Transformers.ZeroShot

texts = [
  "I love this!",
  "Terrible service",
  "It's okay"
]

{:ok, results} = ZeroShot.classify_batch(texts,
  candidate_labels: ["positive", "negative", "neutral"]
)

# results = [
#   %{label: "positive", scores: %{...}, sequence: "I love this!"},
#   %{label: "negative", scores: %{...}, sequence: "Terrible service"},
#   %{label: "neutral", scores: %{...}, sequence: "It's okay"}
# ]
```

### Confidence Thresholding

Reject low-confidence predictions:

```elixir
{:ok, result} = ZeroShot.classify(text,
  candidate_labels: ["positive", "negative", "neutral"]
)

max_score = result.scores[result.label]

if max_score < 0.6 do
  # Too uncertain, flag for human review
  {:uncertain, result}
else
  {:confident, result}
end
```

### Hierarchical Classification

First classify broadly, then refine:

```elixir
# Step 1: Broad category
{:ok, broad} = ZeroShot.classify(text,
  candidate_labels: ["product", "service", "support"]
)

# Step 2: Specific subcategory
specific_labels = case broad.label do
  "product" -> ["quality", "price", "features"]
  "service" -> ["delivery", "installation", "maintenance"]
  "support" -> ["technical", "billing", "general"]
end

{:ok, specific} = ZeroShot.classify(text,
  candidate_labels: specific_labels
)
```

## Comparison with Other Methods

| Method | Training Data | Accuracy | Setup Time | Flexibility |
|--------|---------------|----------|------------|-------------|
| Zero-shot | 0 examples | 70-85% | Instant | Very high |
| Few-shot | 10-100 examples | 80-90% | Minutes | High |
| Fine-tuning | 1000+ examples | 95-99% | Hours | Medium |
| Rule-based | N/A | 60-80% | Days | Low |

**Recommendation**: Start with zero-shot, move to fine-tuning if accuracy is insufficient.

## Production Deployment

### Caching Results

```elixir
defmodule ClassificationCache do
  use GenServer
  
  def classify_cached(text, labels) do
    cache_key = :crypto.hash(:md5, text <> Enum.join(labels)) |> Base.encode16()
    
    case get_cache(cache_key) do
      nil ->
        {:ok, result} = ZeroShot.classify(text, candidate_labels: labels)
        put_cache(cache_key, result)
        result
      
      cached ->
        cached
    end
  end
end
```

### Rate Limiting

```elixir
defmodule RateLimiter do
  def classify_with_limit(text, labels) do
    case check_rate_limit() do
      :ok ->
        ZeroShot.classify(text, candidate_labels: labels)
      
      {:error, :rate_limited} ->
        {:error, "Too many requests, please retry later"}
    end
  end
end
```

### Fallback Strategies

```elixir
def classify_robust(text, labels) do
  case ZeroShot.classify(text, candidate_labels: labels) do
    {:ok, result} ->
      if result.scores[result.label] > 0.6 do
        {:ok, result}
      else
        # Fall back to simpler method
        naive_bayes_classify(text, labels)
      end
    
    {:error, _} ->
      # Model unavailable, use rule-based
      rule_based_classify(text, labels)
  end
end
```

## See Also

- [FINE_TUNING.md](FINE_TUNING.md) - Train models for higher accuracy
- [CROSS_LINGUAL.md](CROSS_LINGUAL.md) - Multilingual classification
- [PRETRAINED_MODELS.md](PRETRAINED_MODELS.md) - Available transformer models