docs/API.md

Select File:
# Nasty Public API Reference

This document describes the public API of Nasty, the Natural Abstract Syntax Tree library for Elixir.

## Core Functions

### Parsing

#### `Nasty.parse/2`

Parses natural language text into an Abstract Syntax Tree (AST).

**Parameters:**
- `text` (String.t()) - The text to parse
- `opts` (keyword()) - Options:
  - `:language` - Language code (`:en`, `:es`, `:ca`, etc.) **Required**
  - `:tokenize` - Enable tokenization (default: `true`)
  - `:pos_tag` - Enable POS tagging (default: `true`)
  - `:parse_dependencies` - Parse dependency relationships (default: `true`)
  - `:extract_entities` - Extract named entities (default: `false`)
  - `:resolve_coreferences` - Resolve coreferences (default: `false`)

**Returns:**
- `{:ok, %Nasty.AST.Document{}}` - Parsed AST document
- `{:error, reason}` - Parse error

**Examples:**

```elixir
# Basic parsing
{:ok, ast} = Nasty.parse("The cat sat on the mat.", language: :en)

# With entity recognition
{:ok, ast} = Nasty.parse("John lives in Paris.", 
  language: :en, 
  extract_entities: true
)

# With coreference resolution
{:ok, ast} = Nasty.parse("Mary loves her cat. She feeds it daily.", 
  language: :en, 
  resolve_coreferences: true
)
```

#### `Nasty.render/2`

Renders an AST back to natural language text.

**Parameters:**
- `ast` (struct()) - AST node to render (Document, Sentence, etc.)
- `opts` (keyword()) - Options (language determined from AST)

**Returns:**
- `{:ok, text}` - Rendered text string
- `{:error, reason}` - Render error

**Examples:**

```elixir
{:ok, ast} = Nasty.parse("The cat sat.", language: :en)
{:ok, text} = Nasty.render(ast)
# => "The cat sat."
```

### Translation

#### `Nasty.Translation.Translator.translate_document/2`

Translates an AST document from one language to another.

**Parameters:**
- `document` - AST Document to translate
- `target_language` - Target language code (`:en`, `:es`, `:ca`, etc.)

**Returns:**
- `{:ok, %Nasty.AST.Document{}}` - Translated AST document
- `{:error, reason}` - Translation error

**Examples:**

```elixir
alias Nasty.Translation.Translator

# Translate English to Spanish
{:ok, doc_en} = Nasty.parse("The cat runs.", language: :en)
{:ok, doc_es} = Translator.translate_document(doc_en, :es)
{:ok, text_es} = Nasty.render(doc_es)
# => "El gato corre."

# Translate Spanish to English  
{:ok, doc_es} = Nasty.parse("La casa grande.", language: :es)
{:ok, doc_en} = Translator.translate_document(doc_es, :en)
{:ok, text_en} = Nasty.render(doc_en)
# => "The big house."

# Or translate text directly
{:ok, text_es} = Translator.translate("The cat runs.", :en, :es)
# => "El gato corre."
```

### Summarization

#### `Nasty.summarize/2`

Summarizes a document by extracting important sentences.

**Parameters:**
- `text_or_ast` - Text string or AST Document to summarize
- `opts` (keyword()) - Options:
  - `:language` - Language code (required if text)
  - `:ratio` - Compression ratio (0.0 to 1.0), default `0.3`
  - `:max_sentences` - Maximum number of sentences in summary
  - `:method` - Selection method: `:greedy` or `:mmr` (default: `:greedy`)
  - `:min_sentence_length` - Minimum sentence length in tokens (default: `3`)
  - `:mmr_lambda` - MMR diversity parameter, 0-1 (default: `0.5`)

**Returns:**
- `{:ok, [%Sentence{}]}` - List of extracted sentences
- `{:error, reason}` - Error

**Examples:**

```elixir
# From text
{:ok, summary} = Nasty.summarize(long_text, 
  language: :en, 
  ratio: 0.3
)

# From AST
{:ok, ast} = Nasty.parse(long_text, language: :en)
{:ok, summary} = Nasty.summarize(ast, max_sentences: 3)

# Using MMR for diversity
{:ok, summary} = Nasty.summarize(text, 
  language: :en, 
  method: :mmr, 
  mmr_lambda: 0.7
)
```

### Code Interoperability

#### `Nasty.to_code/2`

Converts natural language text to code.

**Parameters:**
- `text` (String.t()) - Natural language description
- `opts` (keyword()) - Options:
  - `:source_language` - Source natural language (`:en`, etc.) **Required**
  - `:target_language` - Target programming language (`:elixir`, etc.) **Required**

**Returns:**
- `{:ok, code_string}` - Generated code
- `{:error, reason}` - Error

**Supported Language Pairs:**
- English → Elixir (`:en` → `:elixir`)

**Examples:**

```elixir
# List operations
{:ok, code} = Nasty.to_code("Sort the list", 
  source_language: :en, 
  target_language: :elixir
)
# => "Enum.sort(list)"

# Filter with constraints
{:ok, code} = Nasty.to_code("Filter users where age is greater than 18",
  source_language: :en,
  target_language: :elixir
)
# => "Enum.filter(users, fn item -> item > 18 end)"

# Arithmetic
{:ok, code} = Nasty.to_code("Add x and y",
  source_language: :en,
  target_language: :elixir
)
# => "x + y"
```

#### `Nasty.explain_code/2`

Generates natural language explanation from code.

**Parameters:**
- `code` - Code string or AST to explain
- `opts` (keyword()) - Options:
  - `:source_language` - Programming language (`:elixir`, etc.) **Required**
  - `:target_language` - Target natural language (`:en`, etc.) **Required**
  - `:style` - Explanation style: `:concise` or `:verbose` (default: `:concise`)

**Returns:**
- `{:ok, explanation_string}` - Natural language explanation
- `{:error, reason}` - Error

**Supported Language Pairs:**
- Elixir → English (`:elixir` → `:en`)

**Examples:**

```elixir
{:ok, explanation} = Nasty.explain_code("Enum.sort(list)",
  source_language: :elixir,
  target_language: :en
)
# => "Sort list"

{:ok, explanation} = Nasty.explain_code(
  "list |> Enum.map(&(&1 * 2)) |> Enum.sum()",
  source_language: :elixir,
  target_language: :en
)
# => "Map list to double each element, then sum the results"

# Verbose style
{:ok, explanation} = Nasty.explain_code("x = 5",
  source_language: :elixir,
  target_language: :en,
  style: :verbose
)
```

## Language Registry

### `Nasty.Language.Registry`

Manages language implementations.

#### `Nasty.Language.Registry.register/1`

Registers a language implementation module.

```elixir
Nasty.Language.Registry.register(Nasty.Language.English)
# => :ok
```

#### `Nasty.Language.Registry.get/1`

Gets the implementation module for a language code.

```elixir
{:ok, module} = Nasty.Language.Registry.get(:en)
# => {:ok, Nasty.Language.English}
```

#### `Nasty.Language.Registry.detect_language/1`

Detects the language of the given text.

```elixir
{:ok, language} = Nasty.Language.Registry.detect_language("Hello world")
# => {:ok, :en}

{:ok, language} = Nasty.Language.Registry.detect_language("Hola mundo")
# => {:ok, :es}
```

#### `Nasty.Language.Registry.registered_languages/0`

Returns all registered language codes.

```elixir
Nasty.Language.Registry.registered_languages()
# => [:en, :es, :ca]
```

#### `Nasty.Language.Registry.registered?/1`

Checks if a language is registered.

```elixir
Nasty.Language.Registry.registered?(:en)
# => true
```

## AST Utilities

### Query

#### `Nasty.Utils.Query`

Query and traverse AST structures.

```elixir
alias Nasty.Utils.Query

# Find subject in a sentence
subject = Query.find_subject(sentence)

# Find all noun phrases
noun_phrases = Query.find_all(document, :noun_phrase)

# Find by POS tag
nouns = Query.find_by_pos(document, :noun)
verbs = Query.find_by_pos(document, :verb)

# Count nodes
token_count = Query.count(document, :token)
```

### Validation

#### `Nasty.Utils.Validator`

Validate AST structure.

```elixir
alias Nasty.Utils.Validator

case Validator.validate(document) do
  {:ok, _doc} -> IO.puts("Valid AST")
  {:error, reason} -> IO.puts("Invalid: #{reason}")
end

# Check if valid (boolean)
if Validator.valid?(document) do
  IO.puts("Document is valid")
end
```

### Transformation

#### `Nasty.Utils.Transform`

Transform AST nodes.

```elixir
alias Nasty.Utils.Transform

# Case normalization
lowercased = Transform.normalize_case(document, :lower)

# Remove punctuation
no_punct = Transform.remove_punctuation(document)

# Remove stop words
no_stops = Transform.remove_stop_words(document)

# Lemmatize all tokens
lemmatized = Transform.lemmatize(document)
```

### Traversal

#### `Nasty.Utils.Traversal`

Traverse AST structure.

```elixir
alias Nasty.Utils.Traversal

# Reduce over all nodes
token_count = Traversal.reduce(document, 0, fn
  %Nasty.AST.Token{}, acc -> acc + 1
  _, acc -> acc
end)

# Collect matching nodes
verbs = Traversal.collect(document, fn
  %Nasty.AST.Token{pos_tag: :verb} -> true
  _ -> false
end)

# Map over all nodes
transformed = Traversal.map(document, fn
  %Nasty.AST.Token{} = token ->
    %{token | text: String.downcase(token.text)}
  node -> node
end)
```

## Rendering

### Pretty Print

#### `Nasty.Rendering.PrettyPrint`

Format AST for human-readable inspection.

```elixir
# Pretty print to stdout
Nasty.Rendering.PrettyPrint.inspect(ast)

# Get formatted string
formatted = Nasty.Rendering.PrettyPrint.format(ast)
```

### Visualization

#### `Nasty.Rendering.Visualization`

Generate visualizations of AST structures.

```elixir
# Generate DOT format for Graphviz
{:ok, dot} = Nasty.Rendering.Visualization.to_dot(ast)
File.write("ast.dot", dot)

# Generate JSON representation
{:ok, json} = Nasty.Rendering.Visualization.to_json(ast)
```

### Text Rendering

#### `Nasty.Rendering.Text`

Render AST to text.

```elixir
{:ok, text} = Nasty.Rendering.Text.render(document)
```

## Statistical & Neural Models

### Model Registry

#### `Nasty.Statistics.ModelRegistry`

Manage statistical and neural models.

```elixir
# Register a model
Nasty.Statistics.ModelRegistry.register(:hmm_pos_tagger, model)
Nasty.Statistics.ModelRegistry.register(:neural_pos_tagger, neural_model)

# Get a model
{:ok, model} = Nasty.Statistics.ModelRegistry.get(:hmm_pos_tagger)
{:ok, neural} = Nasty.Statistics.ModelRegistry.get(:neural_pos_tagger)

# List models
models = Nasty.Statistics.ModelRegistry.list_models()
```

### Model Loader

#### `Nasty.Statistics.ModelLoader`

Load and save statistical and neural models.

```elixir
# Load HMM model from file
{:ok, model} = Nasty.Statistics.ModelLoader.load("path/to/model.model")

# Load neural model from file
{:ok, neural} = Nasty.Statistics.POSTagging.NeuralTagger.load("path/to/model.axon")

# Save model to file
:ok = Nasty.Statistics.ModelLoader.save(model, "path/to/model.model")
:ok = NeuralTagger.save(neural, "path/to/model.axon")

# Load from project
{:ok, model} = Nasty.Statistics.ModelLoader.load_from_priv("models/hmm.model")
```

### Neural Models

#### `Nasty.Statistics.POSTagging.NeuralTagger`

Train and use BiLSTM-CRF neural models for POS tagging.

```elixir
# Train a neural model
alias Nasty.Statistics.POSTagging.NeuralTagger

tagger = NeuralTagger.new(
  vocab: vocab,
  tag_vocab: tag_vocab,
  embedding_dim: 300,
  hidden_size: 256,
  num_layers: 2
)

{:ok, trained} = NeuralTagger.train(tagger, training_data,
  epochs: 10,
  batch_size: 32,
  learning_rate: 0.001
)

# Use neural model for prediction
{:ok, tags} = NeuralTagger.predict(trained, ["The", "cat", "sat"], [])

# Save/load neural models
NeuralTagger.save(trained, "model.axon")
{:ok, loaded} = NeuralTagger.load("model.axon")
```

## Data Layer

### CoNLL-U Parser

#### `Nasty.Data.CoNLLU`

Parse and generate CoNLL-U format data.

```elixir
# Parse CoNLL-U file
{:ok, sentences} = Nasty.Data.CoNLLU.parse_file("corpus.conllu")

# Parse CoNLL-U string
{:ok, sentences} = Nasty.Data.CoNLLU.parse(conllu_string)

# Convert AST to CoNLL-U
conllu_string = Nasty.Data.CoNLLU.format(sentence)
```

### Corpus Management

#### `Nasty.Data.Corpus`

Manage text corpora.

```elixir
# Load corpus
{:ok, corpus} = Nasty.Data.Corpus.load("path/to/corpus")

# Get sentences
sentences = Nasty.Data.Corpus.sentences(corpus)

# Statistics
stats = Nasty.Data.Corpus.statistics(corpus)
```

## NLP Operations (English)

These are language-specific operations available for English. Access through the English module.

### Question Answering

```elixir
alias Nasty.Language.English

# Analyze question
{:ok, analysis} = English.QuestionAnalyzer.analyze("What is the capital of France?")

# Extract answer
{:ok, answer} = English.AnswerExtractor.extract(document, analysis)
```

### Text Classification

```elixir
# Train classifier
classifier = English.TextClassifier.train(training_data)

# Classify text
{:ok, category} = English.TextClassifier.classify(classifier, text)
```

### Information Extraction

```elixir
# Extract relations
relations = English.RelationExtractor.extract(document)

# Extract events
events = English.EventExtractor.extract(document)

# Extract with templates
extracted = English.TemplateExtractor.extract(document, templates)
```

### Semantic Role Labeling

```elixir
# Label semantic roles
labeled = English.SemanticRoleLabeler.label(sentence)
```

### Coreference Resolution

```elixir
# Resolve coreferences
{:ok, resolved} = English.CoreferenceResolver.resolve(document)
```

### Translation

#### `Nasty.Translation.Translator`

Translate documents between languages.

```elixir
alias Nasty.Translation.Translator

# Translate document
{:ok, translated_doc} = Translator.translate(source_doc, :es)

# Translate with custom lexicons
{:ok, translated_doc} = Translator.translate(source_doc, :es, lexicon_path: "custom_lexicons/")
```

#### `Nasty.Translation.TokenTranslator`

Translate individual tokens with POS-aware lemma-to-lemma mapping.

```elixir
alias Nasty.Translation.TokenTranslator

# Translate token
translated_token = TokenTranslator.translate_token(token, :en, :es)

# Translate with morphology
translated_token = TokenTranslator.translate_with_morphology(token, :en, :es)
```

#### `Nasty.Translation.Agreement`

Enforce morphological agreement rules.

```elixir
alias Nasty.Translation.Agreement

# Apply gender/number agreement
adjusted_tokens = Agreement.apply_agreement(tokens, :es)

# Check agreement
valid? = Agreement.check_agreement(determiner, noun)
```

#### `Nasty.Translation.WordOrder`

Apply language-specific word order transformations.

```elixir
alias Nasty.Translation.WordOrder

# Transform word order
ordered_phrase = WordOrder.apply_order(phrase, :es)

# Apply adjective position rules  
ordered_np = WordOrder.apply_adjective_order(noun_phrase, :es)
```

#### `Nasty.AST.Renderer`

Render AST back to natural language text.

```elixir
alias Nasty.AST.Renderer

# Render document
{:ok, text} = Renderer.render_document(document)

# Render specific nodes
{:ok, text} = Renderer.render_sentence(sentence)
{:ok, text} = Renderer.render_phrase(phrase)
```

## Error Handling

All public API functions return result tuples:

- `{:ok, result}` on success
- `{:error, reason}` on failure

Common error reasons:

- `:language_required` - Language not specified
- `:language_not_found` - Language not registered
- `:language_not_registered` - Language code not in registry
- `:no_languages_registered` - No languages available
- `:no_match` - Language detection failed
- `:invalid_text` - Invalid input text
- `:parse_error` - Failed to parse text
- `:source_language_required` - Source language not specified
- `:target_language_required` - Target language not specified
- `:unsupported_language_pair` - Language pair not supported
- `:summarization_not_supported` - Summarization not available for language
- `:invalid_input` - Invalid input type

## See Also

- [AST Reference](AST_REFERENCE.md) - Complete AST node documentation
- [User Guide](USER_GUIDE.md) - Tutorial and examples
- [Architecture](ARCHITECTURE.md) - System architecture
- [Language Guide](LANGUAGE_GUIDE.md) - Adding new languages