# Zero-shot Classification Guide
Complete guide to zero-shot text classification in Nasty using Natural Language Inference models.
## Overview
Zero-shot classification allows you to classify text into **arbitrary categories without any training data**. It works by framing classification as a Natural Language Inference (NLI) problem.
**Key Benefits:**
- No training data required
- Works with any label set you define
- Add new categories instantly
- Multi-label classification support
- 70-85% accuracy on many tasks
## How It Works
The model treats classification as textual entailment:
1. **Hypothesis**: "This text is about {label}"
2. **Premise**: Your input text
3. **Prediction**: Probability that premise entails hypothesis
For each candidate label, the model predicts entailment probability. The label with highest probability wins.
### Example
**Text**: "I love this product!"
**Labels**: positive, negative, neutral
**Process**:
- "I love this product!" entails "This text is about positive" → 95%
- "I love this product!" entails "This text is about negative" → 2%
- "I love this product!" entails "This text is about neutral" → 3%
**Result**: positive (95% confidence)
## Quick Start
### CLI Usage
```bash
# Single text classification
mix nasty.zero_shot \
--text "I love this product!" \
--labels positive,negative,neutral
# Output:
# Text: I love this product!
# Predicted: positive
# Confidence: 95.3%
#
# All scores:
# positive: 95.3% ████████████████████
# neutral: 3.2% █
# negative: 1.5%
```
### Programmatic Usage
```elixir
alias Nasty.Statistics.Neural.Transformers.ZeroShot
{:ok, result} = ZeroShot.classify("I love this product!",
candidate_labels: ["positive", "negative", "neutral"]
)
# result = %{
# label: "positive",
# scores: %{
# "positive" => 0.953,
# "neutral" => 0.032,
# "negative" => 0.015
# },
# sequence: "I love this product!"
# }
```
## Common Use Cases
### 1. Sentiment Analysis
```bash
mix nasty.zero_shot \
--text "The movie was boring and predictable" \
--labels positive,negative,neutral
```
**Why it works**: Clear emotional content maps well to sentiment labels.
### 2. Topic Classification
```bash
mix nasty.zero_shot \
--text "Bitcoin reaches new all-time high" \
--labels technology,finance,sports,politics,business
```
**Why it works**: Topics have distinct semantic spaces.
### 3. Intent Detection
```bash
mix nasty.zero_shot \
--text "Can you help me reset my password?" \
--labels question,request,complaint,praise
```
**Why it works**: Intents have characteristic linguistic patterns.
### 4. Content Moderation
```bash
mix nasty.zero_shot \
--text "This is the worst service ever!!!" \
--labels spam,offensive,normal,promotional
```
**Why it works**: Moderation categories have clear signals.
### 5. Email Routing
```bash
mix nasty.zero_shot \
--text "Urgent: Server down in production" \
--labels urgent,normal,low_priority,informational
```
**Why it works**: Urgency and importance have lexical markers.
## Multi-label Classification
Assign multiple labels when appropriate:
```bash
mix nasty.zero_shot \
--text "Urgent: Please review the attached technical document" \
--labels urgent,action_required,informational,technical \
--multi-label \
--threshold 0.5
```
**Output**:
```
Predicted labels: urgent, action_required, technical
All scores:
[✓] urgent: 0.89
[✓] action_required: 0.76
[✓] technical: 0.68
[ ] informational: 0.34
```
Only labels above threshold (0.5) are selected.
### Multi-label Use Cases
- **Document tagging**: Tag with multiple topics
- **Email categorization**: Both "urgent" AND "technical"
- **Content flags**: Multiple moderation issues
- **Skill extraction**: Multiple skills from job description
## Batch Classification
Process multiple texts efficiently:
```bash
# Create input file
cat > texts.txt << EOF
I love this product!
The service was terrible
It's okay, nothing special
EOF
# Classify batch
mix nasty.zero_shot \
--input texts.txt \
--labels positive,negative,neutral \
--output results.json
```
Result saved to `results.json`:
```json
[
{
"text": "I love this product!",
"result": {
"label": "positive",
"scores": {"positive": 0.95, "neutral": 0.03, "negative": 0.02}
},
"success": true
},
...
]
```
## Supported Models
### RoBERTa-MNLI (Default)
**Best for**: English text, highest accuracy
```bash
--model roberta_large_mnli
```
**Specs**:
- Parameters: 355M
- Languages: English only
- Accuracy: 85-90% on many tasks
- Speed: Medium
### BART-MNLI
**Best for**: Alternative to RoBERTa, slightly different strengths
```bash
--model bart_large_mnli
```
**Specs**:
- Parameters: 400M
- Languages: English only
- Accuracy: 83-88%
- Speed: Slower than RoBERTa
### XLM-RoBERTa
**Best for**: Multilingual (Spanish, Catalan, etc.)
```bash
--model xlm_roberta_base
```
**Specs**:
- Parameters: 270M
- Languages: 100 languages
- Accuracy: 75-85% (varies by language)
- Speed: Fast
## Custom Hypothesis Templates
Change how classification is framed:
```bash
# Default template
--hypothesis-template "This text is about {}"
# Custom templates
--hypothesis-template "This message is {}"
--hypothesis-template "The sentiment is {}"
--hypothesis-template "The topic of this text is {}"
--hypothesis-template "This document contains {}"
```
**Example**:
```bash
mix nasty.zero_shot \
--text "Please call me back ASAP" \
--labels urgent,normal,low_priority \
--hypothesis-template "This message is {}"
```
Generates hypotheses:
- "This message is urgent"
- "This message is normal"
- "This message is low_priority"
## Best Practices
### 1. Choose Clear, Distinct Labels
**Good**:
```bash
--labels positive,negative,neutral
--labels urgent,normal,low_priority
--labels technical,business,personal
```
**Bad** (too similar):
```bash
--labels happy,joyful,cheerful # Too similar!
--labels important,critical,essential # Overlapping!
```
### 2. Use Descriptive Label Names
**Good**:
```bash
--labels positive_sentiment,negative_sentiment,neutral_sentiment
```
**Better**:
```bash
--labels positive,negative,neutral # Simpler, but clear
```
**Bad**:
```bash
--labels pos,neg,neu # Too cryptic
--labels 1,2,3 # Meaningless
```
### 3. Provide 2-6 Labels
- **Too few** (1 label): Not classification
- **Sweet spot** (2-6 labels): Best accuracy
- **Too many** (10+ labels): Accuracy degrades
### 4. Use Multi-label for Overlapping Concepts
**Single-label** (mutually exclusive):
```bash
--labels positive,negative,neutral
```
**Multi-label** (can overlap):
```bash
--labels urgent,technical,action_required,informational \
--multi-label
```
### 5. Adjust Threshold for Multi-label
```bash
# Conservative (fewer labels)
--threshold 0.7
# Balanced (default)
--threshold 0.5
# Liberal (more labels)
--threshold 0.3
```
## Performance Tips
### When Zero-shot Works Best
✓ Clear semantic categories
✓ 2-6 distinct labels
✓ Labels have characteristic language patterns
✓ English text (for RoBERTa-MNLI)
✓ Medium-length text (10-200 words)
### When to Use Fine-tuning Instead
✗ Need >90% accuracy
✗ Domain-specific jargon
✗ Subtle distinctions between labels
✗ Have 1000+ labeled examples
✗ Production critical system
Zero-shot is great for prototyping and low-stakes classification. For production, consider fine-tuning.
## Limitations
### 1. Language Dependence
RoBERTa-MNLI only works well for English. For other languages:
```bash
# Spanish/Catalan
--model xlm_roberta_base
```
Expect 10-15% lower accuracy than English.
### 2. Accuracy Ceiling
Zero-shot typically achieves 70-85% accuracy. Fine-tuning can reach 95-99%.
### 3. Context Window
Models have maximum input length (~512 tokens). Long documents need truncation:
```bash
# Truncate to first 512 tokens automatically
--max-length 512
```
### 4. Label Sensitivity
Results can vary with label phrasing:
```bash
# These may give different results:
--labels positive,negative
--labels good,bad
--labels happy,sad
```
Test different phrasings to find what works best.
## Troubleshooting
### All Scores Are Similar
**Problem**: Scores like 0.33, 0.34, 0.33 (no clear winner)
**Causes**:
- Labels are too similar
- Text is ambiguous
- Poor hypothesis template
**Solutions**:
1. Use more distinct labels
2. Try different hypothesis template
3. Add more context to text
4. Consider if text is truly ambiguous
### Wrong Label Predicted
**Problem**: Clearly wrong prediction
**Causes**:
- Label phrasing doesn't match text semantics
- Need different hypothesis template
- Text is out-of-domain for model
**Solutions**:
1. Rephrase labels
2. Change hypothesis template
3. Try different model
4. Consider fine-tuning for your domain
### Slow Performance
**Problem**: Classification takes too long
**Solutions**:
1. Use smaller model (xlm_roberta_base vs roberta_large)
2. Enable GPU (set XLA_TARGET=cuda)
3. Reduce number of labels
4. Use batch processing for multiple texts
## Advanced Usage
### Programmatic Batch Processing
```elixir
alias Nasty.Statistics.Neural.Transformers.ZeroShot
texts = [
"I love this!",
"Terrible service",
"It's okay"
]
{:ok, results} = ZeroShot.classify_batch(texts,
candidate_labels: ["positive", "negative", "neutral"]
)
# results = [
# %{label: "positive", scores: %{...}, sequence: "I love this!"},
# %{label: "negative", scores: %{...}, sequence: "Terrible service"},
# %{label: "neutral", scores: %{...}, sequence: "It's okay"}
# ]
```
### Confidence Thresholding
Reject low-confidence predictions:
```elixir
{:ok, result} = ZeroShot.classify(text,
candidate_labels: ["positive", "negative", "neutral"]
)
max_score = result.scores[result.label]
if max_score < 0.6 do
# Too uncertain, flag for human review
{:uncertain, result}
else
{:confident, result}
end
```
### Hierarchical Classification
First classify broadly, then refine:
```elixir
# Step 1: Broad category
{:ok, broad} = ZeroShot.classify(text,
candidate_labels: ["product", "service", "support"]
)
# Step 2: Specific subcategory
specific_labels = case broad.label do
"product" -> ["quality", "price", "features"]
"service" -> ["delivery", "installation", "maintenance"]
"support" -> ["technical", "billing", "general"]
end
{:ok, specific} = ZeroShot.classify(text,
candidate_labels: specific_labels
)
```
## Comparison with Other Methods
| Method | Training Data | Accuracy | Setup Time | Flexibility |
|--------|---------------|----------|------------|-------------|
| Zero-shot | 0 examples | 70-85% | Instant | Very high |
| Few-shot | 10-100 examples | 80-90% | Minutes | High |
| Fine-tuning | 1000+ examples | 95-99% | Hours | Medium |
| Rule-based | N/A | 60-80% | Days | Low |
**Recommendation**: Start with zero-shot, move to fine-tuning if accuracy is insufficient.
## Production Deployment
### Caching Results
```elixir
defmodule ClassificationCache do
use GenServer
def classify_cached(text, labels) do
cache_key = :crypto.hash(:md5, text <> Enum.join(labels)) |> Base.encode16()
case get_cache(cache_key) do
nil ->
{:ok, result} = ZeroShot.classify(text, candidate_labels: labels)
put_cache(cache_key, result)
result
cached ->
cached
end
end
end
```
### Rate Limiting
```elixir
defmodule RateLimiter do
def classify_with_limit(text, labels) do
case check_rate_limit() do
:ok ->
ZeroShot.classify(text, candidate_labels: labels)
{:error, :rate_limited} ->
{:error, "Too many requests, please retry later"}
end
end
end
```
### Fallback Strategies
```elixir
def classify_robust(text, labels) do
case ZeroShot.classify(text, candidate_labels: labels) do
{:ok, result} ->
if result.scores[result.label] > 0.6 do
{:ok, result}
else
# Fall back to simpler method
naive_bayes_classify(text, labels)
end
{:error, _} ->
# Model unavailable, use rule-based
rule_based_classify(text, labels)
end
end
```
## See Also
- [FINE_TUNING.md](FINE_TUNING.md) - Train models for higher accuracy
- [CROSS_LINGUAL.md](CROSS_LINGUAL.md) - Multilingual classification
- [PRETRAINED_MODELS.md](PRETRAINED_MODELS.md) - Available transformer models