docs/NEURAL_COREFERENCE.md

# Neural Coreference Resolution

Advanced neural coreference resolution using BiLSTM-CRF architecture.

## Overview

This implementation provides neural coreference resolution that improves accuracy from ~70% F1 (rule-based) to 75-80% F1 (neural pair model).

## Architecture

### Phase 1: Neural Pair Model (Implemented)

**Components**:
1. **Mention Encoder** - BiLSTM with attention over context
2. **Pair Scorer** - Feedforward network with 20 hand-crafted features
3. **Neural Resolver** - Integration with existing mention detection
4. **Evaluator** - MUC, B³, CEAF metrics

**Workflow**:
```
Document → Mention Detection → Neural Encoding → Pairwise Scoring → Clustering → Coreference Chains
```

## Quick Start

### Training

```bash
mix nasty.train.coref \
  --corpus data/ontonotes/train \
  --dev data/ontonotes/dev \
  --output priv/models/en/coref \
  --epochs 20 \
  --batch-size 32
```

### Evaluation

```bash
mix nasty.eval.coref \
  --model priv/models/en/coref \
  --test data/ontonotes/test
```

### Using in Code

```elixir
alias Nasty.Semantic.Coreference.Neural.{Resolver, Trainer}

# Load models
{:ok, models, params, vocab} = Trainer.load_models("priv/models/en/coref")

# Resolve coreferences
{:ok, document} = Resolver.resolve(document, models, params, vocab)

# Access chains
document.coref_chains
|> Enum.each(fn chain ->
  IO.puts("Chain #{chain.id}: #{chain.representative}")
  IO.puts("  Mentions: #{length(chain.mentions)}")
end)
```

## Data Format

### OntoNotes CoNLL-2012

The system expects CoNLL-2012 format with coreference annotations:

```
doc1  0  0  John   NNP  ...  (0
doc1  0  1  works  VBZ  ...  -
doc1  0  2  at     IN   ...  -
doc1  0  3  Google NNP  ...  (1)
...
doc1  0  10 He     PRP  ...  0)
```

## Modules

### Core Neural Components

- **`Nasty.Data.OntoNotes`** - CoNLL-2012 data loader
- **`Nasty.Semantic.Coreference.Neural.MentionEncoder`** - BiLSTM mention encoder
- **`Nasty.Semantic.Coreference.Neural.PairScorer`** - Neural pair scoring
- **`Nasty.Semantic.Coreference.Neural.Trainer`** - Training pipeline
- **`Nasty.Semantic.Coreference.Neural.Resolver`** - Integration layer

### Evaluation

- **`Nasty.Semantic.Coreference.Evaluator`** - Standard coreference metrics

### Mix Tasks

- **`mix nasty.train.coref`** - Train models
- **`mix nasty.eval.coref`** - Evaluate models

## Model Architecture Details

### Mention Encoder

- Input: Token IDs + mention mask
- Embedding: 100d (GloVe compatible)
- BiLSTM: 128 hidden units
- Attention: Over mention span
- Output: 256d mention representation

### Pair Scorer

- Input: [m1_encoding (256d), m2_encoding (256d), features (20d)]
- Hidden layers: [512, 256] with ReLU + dropout
- Output: Sigmoid probability

### Features (20 total)

1-3. Distance features (sentence, token, mention)
4-6. String match (exact, partial, head)
7-12. Mention types (pronoun, name, definite NP for each)
13-15. Agreement (gender, number, entity type)
16-20. Positional (same sentence, first mentions, pronoun-name pair)

## Training

### Hyperparameters

- Epochs: 20 (with early stopping)
- Batch size: 32
- Learning rate: 0.001 (Adam)
- Dropout: 0.3
- Patience: 3 epochs
- Max distance: 3 sentences

### Data Preparation

- Positive pairs: Mentions in same chain
- Negative pairs: Mentions in different chains
- Ratio: 1:1 (configurable)
- Shuffling: Enabled

## Evaluation Metrics

### MUC (Mention-based)
Measures minimum links needed to connect mentions.

### B³ (Entity-based)
Averages precision/recall per mention.

### CEAF (Entity alignment)
Optimal alignment between gold and predicted chains.

### CoNLL F1
Average of MUC, B³, and CEAF F1 scores.

## Performance

### Expected Results

- **Rule-based baseline**: ~70% CoNLL F1
- **Neural pair model**: 75-80% CoNLL F1
- **Improvement**: +5-10 F1 points

### Speed

- Encoding: ~100 mentions/sec
- Scoring: ~1000 pairs/sec
- End-to-end: ~50-100ms per document

## Future Enhancements

### Phase 2: Span-Based End-to-End (Planned)

- Joint mention detection + coreference
- Span enumeration with pruning
- End-to-end optimization
- Target: 82-85% CoNLL F1

### Phase 3: Transformer Fine-tuning (Planned)

- SpanBERT or Longformer
- Pre-trained contextual embeddings
- Target: 88-90% CoNLL F1

## Troubleshooting

### Out of Memory

- Reduce batch size: `--batch-size 16`
- Use smaller hidden dim: `--hidden-dim 64`
- Process fewer documents at once

### Low Accuracy

- Check data format (CoNLL-2012)
- Increase training epochs: `--epochs 30`
- Add more training data
- Tune hyperparameters

### Slow Training

- Use GPU acceleration (EXLA)
- Increase batch size: `--batch-size 64`
- Reduce max distance: `--max-distance 2`

## References

- Lee et al. (2017). "End-to-end Neural Coreference Resolution"
- Vilain et al. (1995). "A model-theoretic coreference scoring scheme"
- Pradhan et al. (2012). "CoNLL-2012 shared task"

## See Also

- [COREFERENCE_TRAINING.md](COREFERENCE_TRAINING.md) - Detailed training guide
- [Plan](../docs/plans/) - Complete implementation roadmap
- [API.md](API.md) - Full API reference