# Neural Models in Nasty
Complete guide to using neural network models in Nasty for state-of-the-art NLP performance.
## Overview
Nasty integrates neural network models using **Axon**, Elixir's neural network library, providing:
- **BiLSTM-CRF architecture** for sequence tagging (POS, NER)
- **97-98% accuracy** on standard POS tagging benchmarks
- **EXLA JIT compilation** for 10-100x speedup
- **Seamless integration** with existing pipeline
- **Pre-trained embedding support** (GloVe, FastText)
- **Model persistence** and loading
- **Graceful fallbacks** to HMM and rule-based models
## Quick Start
### Installation
Neural dependencies are already included in `mix.exs`:
```elixir
# Already added
{:axon, "~> 0.7"}, # Neural networks
{:nx, "~> 0.9"}, # Numerical computing
{:exla, "~> 0.9"}, # XLA compiler (GPU/CPU acceleration)
{:bumblebee, "~> 0.6"}, # Pre-trained models
{:tokenizers, "~> 0.5"} # Fast tokenization
```
### Basic Usage
```elixir
# Parse text with neural POS tagger
{:ok, ast} = Nasty.parse("The cat sat on the mat.",
language: :en,
model: :neural
)
# Tokens will have POS tags predicted by neural model
```
### Training Your Own Model
```bash
# Download Universal Dependencies corpus
# https://universaldependencies.org/
# Train neural POS tagger
mix nasty.train.neural_pos \
--corpus data/en_ewt-ud-train.conllu \
--test-corpus data/en_ewt-ud-test.conllu \
--epochs 10 \
--hidden-size 256
# Model saved to priv/models/en/pos_neural_v1.axon
```
### Using Trained Models
```elixir
alias Nasty.Statistics.POSTagging.NeuralTagger
# Load model
{:ok, model} = NeuralTagger.load("priv/models/en/pos_neural_v1.axon")
# Predict
words = ["The", "cat", "sat"]
{:ok, tags} = NeuralTagger.predict(model, words, [])
# => {:ok, [:det, :noun, :verb]}
```
## Architecture
### BiLSTM-CRF
The default architecture is **Bidirectional LSTM with CRF** (Conditional Random Field):
```mermaid
flowchart TD
A[Input Words]
B["Word Embeddings (300d)"]
C["BiLSTM Layer 1 (256 hidden units)"]
D["Dropout (0.3)"]
E["BiLSTM Layer 2 (256 hidden units)"]
F["Dense Projection → POS Tags"]
G[Softmax/CRF]
H[Output Tags]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
```
**Key Features:**
- Bidirectional context (forward + backward)
- Optional character-level CNN for OOV handling
- Dropout regularization
- 2-3 LSTM layers (configurable)
- 256-512 hidden units (configurable)
### Performance
**Accuracy:**
- POS Tagging: 97-98% (vs 95% HMM, 85% rule-based)
- NER: 88-92% F1 (future)
- Dependency Parsing: 94-96% UAS (future)
**Speed (on UD English, 12k sentences):**
- CPU: ~30-60 minutes training
- GPU (EXLA): ~5-10 minutes training
- Inference: ~1000-5000 tokens/second (CPU)
- Inference: ~10000+ tokens/second (GPU)
## Model Integration Modes
Nasty provides multiple integration modes:
### 1. Neural Only (`:neural`)
Uses only the neural model:
```elixir
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural)
```
**Fallback:** If neural model unavailable, falls back to HMM → rule-based.
### 2. Neural Ensemble (`:neural_ensemble`)
Combines neural + HMM + rule-based:
```elixir
{:ok, ast} = Nasty.parse(text, language: :en, model: :neural_ensemble)
```
**Strategy:**
- Use rule-based for punctuation and numbers (high confidence)
- Use neural predictions for content words
- Best accuracy overall
### 3. Traditional Modes
Still available:
- `:rule_based` - Fast, 85% accuracy
- `:hmm` - 95% accuracy
- `:ensemble` - HMM + rules
## Training Guide
### 1. Prepare Data
Download Universal Dependencies corpus:
```bash
# English
wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu
# Or other languages
# Spanish, Catalan, etc.
```
### 2. Train Model
```bash
mix nasty.train.neural_pos \
--corpus en_ewt-ud-train.conllu \
--test-corpus en_ewt-ud-test.conllu \
--output priv/models/en/pos_neural_v1.axon \
--epochs 10 \
--batch-size 32 \
--learning-rate 0.001 \
--hidden-size 256 \
--num-layers 2 \
--dropout 0.3 \
--use-char-cnn false
```
### 3. Evaluate
The training task automatically evaluates on test set and reports:
- Overall accuracy
- Per-tag precision, recall, F1
- Confusion matrix (if requested)
### 4. Deploy
Models are automatically saved with:
- Model weights (`.axon` file)
- Metadata (`.meta.json` file)
- Vocabulary and tag mappings
Load via `ModelLoader.load_latest(:en, :pos_tagging_neural)` or directly with `NeuralTagger.load/1`.
## Programmatic Training
```elixir
alias Nasty.Statistics.POSTagging.NeuralTagger
alias Nasty.Statistics.Neural.DataLoader
# Load corpus
{:ok, sentences} = DataLoader.load_conllu("train.conllu")
# Split data
{train, valid} = DataLoader.split(sentences, [0.9, 0.1])
# Build vocabularies
{:ok, vocab, tag_vocab} = DataLoader.build_vocabularies(train, min_freq: 2)
# Create model
tagger = NeuralTagger.new(
vocab: vocab,
tag_vocab: tag_vocab,
embedding_dim: 300,
hidden_size: 256,
num_layers: 2,
dropout: 0.3
)
# Train
{:ok, trained} = NeuralTagger.train(tagger, train,
epochs: 10,
batch_size: 32,
learning_rate: 0.001,
validation_split: 0.1
)
# Save
NeuralTagger.save(trained, "my_model.axon")
```
## Pre-trained Embeddings
### Using GloVe
```elixir
alias Nasty.Statistics.Neural.Embeddings
# Load GloVe embeddings
{:ok, embeddings} = Embeddings.load_glove("glove.6B.300d.txt", vocab)
# Use during training
tagger = NeuralTagger.new(
vocab: vocab,
tag_vocab: tag_vocab,
pretrained_embeddings: embeddings
)
```
Download GloVe:
```bash
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip
```
## Advanced Features
### Character-Level CNN
For better OOV handling:
```bash
mix nasty.train.neural_pos \
--corpus train.conllu \
--use-char-cnn \
--char-filters 3,4,5 \
--char-num-filters 30
```
### Custom Architectures
Extend `Nasty.Statistics.Neural.Architectures.BiLSTMCRF`:
```elixir
defmodule MyArchitecture do
def build(opts) do
# Custom Axon model
Axon.input("tokens")
|> Axon.embedding(opts[:vocab_size], opts[:embedding_dim])
|> # ... your architecture
end
end
```
### Streaming Training
For large datasets:
```elixir
DataLoader.stream_batches("huge_corpus.conllu", vocab, tag_vocab, batch_size: 64)
|> Stream.take(1000) # Process in chunks
|> Enum.each(&train_batch/1)
```
## Troubleshooting
### EXLA Compilation Issues
If EXLA fails to compile:
```bash
# Install XLA dependencies
# Ubuntu/Debian:
sudo apt-get install build-essential
# Set compiler flags
export ELIXIR_ERL_OPTIONS="+fnu"
mix deps.clean exla --build
mix deps.get
```
### Out of Memory
Reduce batch size:
```bash
mix nasty.train.neural_pos --batch-size 16 # Instead of 32
```
Or use gradient accumulation:
```elixir
# In training opts
accumulation_steps: 4
```
### Slow Training
Enable EXLA:
```elixir
# Should be automatic, but verify:
compiler: EXLA
```
Use GPU if available:
```bash
export XLA_TARGET=cuda
```
## Future Enhancements
- **Transformers**: BERT, RoBERTa via Bumblebee
- **NER models**: BiLSTM-CRF for named entity recognition
- **Dependency parsing**: Biaffine attention parser
- **Multilingual**: mBERT, XLM-R support
- **Model quantization**: INT8 for faster inference
- **Knowledge distillation**: Compress large models
## See Also
- [TRAINING_NEURAL.md](TRAINING_NEURAL.md) - Detailed training guide
- [PRETRAINED_MODELS.md](PRETRAINED_MODELS.md) - Using transformers
- [API.md](API.md) - Full API documentation
- [BiLSTM-CRF paper](https://arxiv.org/abs/1508.01991)
- [Axon documentation](https://hexdocs.pm/axon)