# Nasty Examples Catalog
Comprehensive catalog of all example scripts demonstrating Nasty's capabilities.
## Quick Start
All examples can be run directly:
```bash
elixir examples/example_name.exs
```
Or make them executable:
```bash
chmod +x examples/example_name.exs
./examples/example_name.exs
```
## Basic Examples
### tokenizer_example.exs
**Purpose**: Introduction to tokenization
**What it demonstrates**:
- Basic tokenization with NimbleParsec
- Position tracking (line, column, byte offsets)
- Handling contractions (don't, it's)
- Punctuation as separate tokens
- Sentence boundary detection
**Run**:
```bash
elixir examples/tokenizer_example.exs
```
**Best for**: Understanding the first step in the NLP pipeline
---
### hmm_pos_tagger_example.exs
**Purpose**: Statistical POS tagging with Hidden Markov Models
**What it demonstrates**:
- Training HMM POS taggers from CoNLL-U data
- Viterbi algorithm for sequence tagging
- Model evaluation and accuracy metrics
- Comparison with rule-based tagging
- Model persistence (save/load)
**Run**:
```bash
elixir examples/hmm_pos_tagger_example.exs
```
**Best for**: Learning about statistical NLP models
---
### neural_pos_tagger_example.exs
**Purpose**: Neural POS tagging with BiLSTM-CRF
**What it demonstrates**:
- BiLSTM-CRF architecture with Axon/EXLA
- Training neural models on UD corpora
- Character-level embeddings for OOV handling
- GPU acceleration with EXLA
- 97-98% accuracy on benchmark datasets
**Run**:
```bash
elixir examples/neural_pos_tagger_example.exs
```
**Best for**: Understanding deep learning for NLP
---
## Language-Specific Examples
### spanish_example.exs
**Purpose**: Spanish language processing
**What it demonstrates**:
- Spanish tokenization (¿?, ¡!, del, al contractions)
- Spanish POS tagging with morphology
- Gender/number agreement
- Parsing Spanish sentence structure
- Entity recognition with Spanish lexicons
**Run**:
```bash
elixir examples/spanish_example.exs
```
**Best for**: Working with Romance languages
---
### catalan_example.exs
**Purpose**: Catalan language processing
**What it demonstrates**:
- Catalan-specific tokenization (interpunct l·l, apostrophes)
- All 10 Catalan diacritics (à, è, é, í, ï, ò, ó, ú, ü, ç)
- Article contractions (del, al, pel, cal)
- Catalan morphology and POS tagging
- Entity recognition with Catalan lexicons
- Translation between Catalan and English
**Run**:
```bash
elixir examples/catalan_example.exs
```
**Best for**: Catalan NLP applications
---
## Translation Examples
### translation_example.exs
**Purpose**: Basic AST-based translation
**What it demonstrates**:
- English ↔ Spanish translation
- AST-level translation preserving grammar
- Morphological agreement enforcement
- Word order transformations
- Rendering translated AST to text
**Run**:
```bash
elixir examples/translation_example.exs
```
**Best for**: Getting started with translation
---
### roundtrip_translation.exs
**Purpose**: Translation quality analysis
**What it demonstrates**:
- English → Spanish → English roundtrips
- English → Catalan → English roundtrips
- Spanish → English → Spanish roundtrips
- Similarity metrics and quality assessment
- Challenging translation cases
- Performance across complexity levels
**Run**:
```bash
elixir examples/roundtrip_translation.exs
```
**Best for**: Evaluating translation quality
---
### multilingual_pipeline.exs
**Purpose**: Side-by-side multilingual comparison
**What it demonstrates**:
- Processing same content in English, Spanish, Catalan
- Token-level comparison across languages
- POS tagging differences
- Morphological feature comparison
- Translation matrix (all language pairs)
- Performance benchmarking
- Language-specific features summary
**Run**:
```bash
elixir examples/multilingual_pipeline.exs
```
**Best for**: Understanding cross-language differences
---
## Advanced NLP Tasks
### summarization.exs
**Purpose**: Extractive text summarization
**What it demonstrates**:
- Position-weighted sentence scoring
- Entity density calculation
- Discourse marker detection
- Keyword frequency (TF)
- MMR (Maximal Marginal Relevance) for diversity
- Compression ratio vs. fixed sentence count
**Run**:
```bash
elixir examples/summarization.exs
```
**Best for**: Document summarization applications
---
### question_answering.exs
**Purpose**: Extractive question answering
**What it demonstrates**:
- Question classification (WHO, WHAT, WHEN, WHERE, WHY, HOW)
- Answer extraction strategies
- Entity type filtering
- Keyword matching with lemmatization
- Confidence scoring
- Multiple answer support
**Run**:
```bash
elixir examples/question_answering.exs
```
**Best for**: Building Q&A systems
---
### text_classification.exs
**Purpose**: Document classification
**What it demonstrates**:
- Multinomial Naive Bayes classifier
- Feature extraction (BOW, n-grams, POS patterns, entities, lexical)
- Training on labeled data
- Multi-class classification
- Model evaluation (accuracy, precision, recall, F1)
- Sentiment analysis example
**Run**:
```bash
elixir examples/text_classification.exs
```
**Best for**: Text categorization tasks
---
### information_extraction.exs
**Purpose**: Structured information extraction
**What it demonstrates**:
- Relation extraction (employment, organization, location)
- Event extraction (acquisitions, foundings, announcements)
- Template-based extraction
- Pattern matching with verb patterns
- Confidence scoring
- Integration with NER and dependencies
**Run**:
```bash
elixir examples/information_extraction.exs
```
**Best for**: Knowledge base construction
---
## Code Interoperability
### code_generation.exs
**Purpose**: Natural language to code
**What it demonstrates**:
- Intent recognition from natural language
- Constraint extraction (comparison, property, range)
- Elixir code generation
- List operations (sort, filter, map, reduce)
- Arithmetic expressions
- Conditional statements
**Run**:
```bash
elixir examples/code_generation.exs
```
**Best for**: Natural language programming interfaces
---
### code_explanation.exs
**Purpose**: Code to natural language
**What it demonstrates**:
- Elixir AST parsing
- Code explanation generation
- Pipeline explanation
- Function call description
- Variable usage analysis
**Run**:
```bash
elixir examples/code_explanation.exs
```
**Best for**: Code documentation and understanding
---
## Neural Network Examples
### pretrained_model_usage.exs
**Purpose**: Using pre-trained transformers
**What it demonstrates**:
- BERT and RoBERTa via Bumblebee
- Fine-tuning for POS tagging and NER
- Zero-shot classification
- Model quantization (INT8)
- Multilingual models (XLM-RoBERTa)
**Run**:
```bash
elixir examples/pretrained_model_usage.exs
```
**Best for**: Leveraging pre-trained models
---
### transformer_pos_example.exs
**Purpose**: Transformer-based POS tagging
**What it demonstrates**:
- RoBERTa for POS tagging
- Fine-tuning transformers
- 98-99% accuracy
- Cross-lingual transfer
- Model comparison
**Run**:
```bash
elixir examples/transformer_pos_example.exs
```
**Best for**: State-of-the-art accuracy
---
### advanced_neural_features.exs
**Purpose**: Advanced neural NLP features
**What it demonstrates**:
- Multiple neural architectures
- Ensemble methods
- Model quantization
- Zero-shot learning
- Cross-lingual transfer
- Performance optimization
**Run**:
```bash
elixir examples/advanced_neural_features.exs
```
**Best for**: Production neural NLP systems
---
## Comprehensive Demos
### comprehensive_demo.exs
**Purpose**: Complete NLP pipeline walkthrough
**What it demonstrates**:
- Full pipeline from tokenization to summarization
- All major NLP tasks
- Entity recognition
- Dependency extraction
- Semantic role labeling
- Coreference resolution
- Information extraction
**Run**:
```bash
./examples/comprehensive_demo.exs
```
**Best for**: Overview of all capabilities
---
## Example Selection Guide
### By Use Case
**Text Analysis**:
- tokenizer_example.exs
- hmm_pos_tagger_example.exs
- comprehensive_demo.exs
**Machine Learning**:
- neural_pos_tagger_example.exs
- transformer_pos_example.exs
- text_classification.exs
- advanced_neural_features.exs
**Multilingual**:
- spanish_example.exs
- catalan_example.exs
- translation_example.exs
- roundtrip_translation.exs
- multilingual_pipeline.exs
**Information Extraction**:
- question_answering.exs
- information_extraction.exs
- summarization.exs
**Code Integration**:
- code_generation.exs
- code_explanation.exs
### By Difficulty Level
**Beginner**:
1. tokenizer_example.exs
2. spanish_example.exs
3. translation_example.exs
4. summarization.exs
**Intermediate**:
1. hmm_pos_tagger_example.exs
2. catalan_example.exs
3. question_answering.exs
4. text_classification.exs
5. multilingual_pipeline.exs
**Advanced**:
1. neural_pos_tagger_example.exs
2. information_extraction.exs
3. transformer_pos_example.exs
4. advanced_neural_features.exs
5. roundtrip_translation.exs
### By Processing Time
**Fast (<1 second)**:
- tokenizer_example.exs
- translation_example.exs
- spanish_example.exs
**Medium (1-10 seconds)**:
- catalan_example.exs
- multilingual_pipeline.exs
- summarization.exs
- question_answering.exs
**Slow (>10 seconds)**:
- hmm_pos_tagger_example.exs (if training)
- neural_pos_tagger_example.exs
- transformer_pos_example.exs
- roundtrip_translation.exs
## Running Multiple Examples
### Run all basic examples:
```bash
for example in tokenizer_example spanish_example translation_example; do
echo "Running ${example}..."
elixir examples/${example}.exs
echo "---"
done
```
### Run all translation examples:
```bash
for example in translation_example roundtrip_translation multilingual_pipeline; do
elixir examples/${example}.exs
done
```
### Run all language-specific examples:
```bash
elixir examples/spanish_example.exs
elixir examples/catalan_example.exs
elixir examples/multilingual_pipeline.exs
```
## Expected Output
### Typical Output Format
Most examples output:
1. **Section headers**: Clearly marked sections
2. **Input text**: What's being processed
3. **Results**: Parsed output, tags, entities, etc.
4. **Statistics**: Counts, accuracy, timing
5. **Summary**: Key takeaways
### Example Output Snippet
```
========================================
Spanish Language Processing Demo
========================================
1. Tokenization
---------------
Input: El gato duerme en el sofá.
Tokens:
El (1:1)
gato (1:4)
duerme (1:9)
...
2. POS Tagging
--------------
Tagged tokens:
El → det
gato → noun
duerme → verb
...
```
## Troubleshooting
### Common Issues
**Example won't run**:
```bash
# Make sure dependencies are installed
mix deps.get
mix compile
# Check file permissions
chmod +x examples/example_name.exs
```
**Missing models**:
Some examples (neural, transformer) require trained models. See [TRAINING_NEURAL.md](TRAINING_NEURAL.md) for training instructions.
**Out of memory**:
Neural/transformer examples may need more memory. Reduce batch size or use smaller models.
## Creating Your Own Examples
Template for new examples:
```elixir
#!/usr/bin/env elixir
# Your Example Name
#
# Brief description of what this example demonstrates
Mix.install([
{:nasty, path: Path.expand("..", __DIR__)}
])
alias Nasty.Language.English
IO.puts("\n========================================")
IO.puts("Your Example Title")
IO.puts("========================================\n")
# Example 1: First concept
IO.puts("1. First Section")
IO.puts("----------------")
# Your code here
# Example 2: Second concept
IO.puts("\n2. Second Section")
IO.puts("-----------------")
# Your code here
IO.puts("\n========================================")
IO.puts("Example Complete!")
IO.puts("========================================\n")
```
## See Also
- [GETTING_STARTED.md](GETTING_STARTED.md) - Tutorial for beginners
- [USER_GUIDE.md](USER_GUIDE.md) - Comprehensive usage guide
- [API.md](API.md) - API reference
- [TRANSLATION.md](TRANSLATION.md) - Translation system guide