lib/metastatic/analysis/ANALYZER_PLUGIN_SYSTEM.md

Select File:
lib/metastatic/analysis/ANALYZER_PLUGIN_SYSTEM.md

# Analyzer Plugin System

The Analyzer Plugin System provides a unified, extensible framework for code analysis in Metastatic. Write custom analysis rules once and apply them across all supported programming languages through the unified MetaAST representation.

## Quick Start

```elixir
alias Metastatic.{Document, Analysis.Runner}

# Create a document from code
ast = {:binary_op, :arithmetic, :+, {:variable, "x"}, {:literal, :integer, 5}}
doc = Document.new(ast, :python)

# Run analyzers
{:ok, report} = Runner.run(doc)

# Check results
IO.puts("Found #{report.summary.total} issues")
Enum.each(report.issues, fn issue ->
  IO.puts("[#{issue.severity}] #{issue.message}")
end)
```

## Core Concepts

### Analyzer Behaviour

An analyzer is a module that implements the `Metastatic.Analysis.Analyzer` behaviour:

- **`info/0`** - Returns metadata about the analyzer
- **`analyze/2`** - Called for each AST node during traversal
- **`run_before/1`** (optional) - Called before traversal starts
- **`run_after/2`** (optional) - Called after traversal completes

### Registry

The `Metastatic.Analysis.Registry` manages analyzer discovery and configuration:

```elixir
alias Metastatic.Analysis.Registry

# Register an analyzer
Registry.register(MyCustomAnalyzer)

# List all registered analyzers
Registry.list_all()

# List by category
Registry.list_by_category(:correctness)

# Configure an analyzer
Registry.configure(MyAnalyzer, %{threshold: 10})
```

### Runner

The `Metastatic.Analysis.Runner` executes analyzers on documents:

```elixir
alias Metastatic.Analysis.Runner

# Run all registered analyzers
{:ok, report} = Runner.run(doc)

# Run specific analyzers
{:ok, report} = Runner.run(doc, analyzers: [UnusedVariables, SimplifyConditional])

# With configuration
{:ok, report} = Runner.run(doc,
  analyzers: :all,
  config: %{
    nesting_depth: %{max_depth: 4},
    unused_variables: %{ignore_prefix: "_"}
  }
)
```

## Using Built-in Analyzers

### Business-logic Analyzers

#### 1. Pure MetaAST (9 analyzers)

Language-agnostic patterns using only M2.1/M2.2 constructs:

1. **CallbackHell** - Detects deeply nested conditionals (callback hell pattern)
2. **MissingErrorHandling** - Pattern matching without error case handling
3. **SilentErrorCase** - Conditionals with only success path
4. **SwallowingException** - Exception handling without logging
5. **HardcodedValue** - Hardcoded URLs/IPs in string literals
6. **NPlusOneQuery** - Database queries in collection operations
7. **InefficientFilter** - Fetch-all then filter anti-pattern
8. **UnmanagedTask** - Unsupervised async operations
9. **TelemetryInRecursiveFunction** - Metrics emission in recursive functions

#### 2. Function Name Heuristics (4 analyzers)

Detection based on function name patterns:

10. **MissingTelemetryForExternalHttp** - HTTP calls without telemetry/observability
11. **SyncOverAsync** - Blocking operations in async contexts
12. **DirectStructUpdate** - Struct/object updates bypassing validation
13. **MissingHandleAsync** - Fire-and-forget async operations without supervision

#### 3. Naming Conventions (4 analyzers)

Detection based on function/module naming conventions:

14. **BlockingInPlug** - Blocking I/O operations in HTTP middleware
15. **MissingTelemetryInAuthPlug** - Authentication/authorization without audit logging
16. **MissingTelemetryInLiveviewMount** - Component lifecycle methods without telemetry
17. **MissingTelemetryInObanWorker** - Background job processing without metrics

#### 4. Content Analysis (3 analyzers)

Pattern detection through string content analysis:

18. **MissingPreload** - Database queries without eager loading (N+1 risk)
19. **InlineJavascript** - Inline executable code in strings (XSS vulnerability)
20. **MissingThrottle** - Expensive operations without rate limiting

### Generic Analyzers

#### SimplifyConditional

Suggests simplification of redundant conditionals:

```elixir
# Detects patterns like:
# if x then true else false  →  x
# if x then false else true  →  not x

{:ok, report} = Runner.run(doc, analyzers: [SimplifyConditional])
```

**Configuration:** None (not configurable)

#### DeadCodeAnalyzer

Detects unreachable and dead code:

```elixir
# Detects:
# - Code after return statements
# - Branches in constant conditionals

{:ok, report} = Runner.run(doc,
  analyzers: [DeadCodeAnalyzer],
  config: %{dead_code: [min_confidence: :high]}
)
```

**Configuration:**
- `:min_confidence` - `:low` (all), `:medium`, `:high` (only definite)

#### NestingDepth

Detects excessive nesting depth:

```elixir
# Warns when nesting exceeds thresholds

{:ok, report} = Runner.run(doc,
  analyzers: [NestingDepth],
  config: %{nesting_depth: [max_depth: 4, warn_threshold: 3]}
)
```

**Configuration:**
- `:max_depth` - Maximum allowed depth (default: 5)
- `:warn_threshold` - Warning threshold (default: 4)

#### UnusedVariables

Detects unused variables:

```elixir
# Finds assigned but never used variables

{:ok, report} = Runner.run(doc,
  analyzers: [UnusedVariables],
  config: %{unused_variables: [ignore_prefix: "_"]}
)
```

**Configuration:**
- `:ignore_underscore` - Ignore variables starting with underscore (default: true)

## Understanding Reports

The runner returns a report with:

```elixir
{:ok, report} = Runner.run(doc)

# Report structure:
%{
  document: Document.t(),           # The analyzed document
  analyzers_run: [module()],        # Which analyzers ran
  issues: [Analyzer.issue()],       # All issues found
  summary: %{                       # Aggregated statistics
    total: integer(),
    by_severity: %{atom() => integer()},
    by_category: %{atom() => integer()},
    by_analyzer: %{atom() => integer()}
  },
  timing: %{total_ms: float()} | nil # Performance info
}
```

### Issue Structure

Each issue contains:

```elixir
%{
  analyzer: module(),               # Which analyzer found it
  category: atom(),                 # Category (:correctness, :style, etc.)
  severity: atom(),                 # :error, :warning, :info, :refactoring_opportunity
  message: String.t(),              # Human-readable message
  node: Metastatic.AST.meta_ast(),  # The problematic node
  location: %{                      # Location info
    line: non_neg_integer() | nil,
    column: non_neg_integer() | nil,
    path: Path.t() | nil
  },
  suggestion: %{                    # Optional refactoring suggestion
    type: :replace | :remove | :insert_before | :insert_after,
    replacement: meta_ast() | nil,
    message: String.t()
  } | nil,
  metadata: map()                   # Analyzer-specific data
}
```

## Common Patterns

### Filter by Severity

```elixir
errors = Enum.filter(report.issues, &(&1.severity == :error))
warnings = Enum.filter(report.issues, &(&1.severity == :warning))
refactoring = Enum.filter(report.issues, &(&1.severity == :refactoring_opportunity))
```

### Filter by Category

```elixir
correctness_issues = Enum.filter(report.issues, &(&1.category == :correctness))
style_issues = Enum.filter(report.issues, &(&1.category == :style))
```

### Group by Analyzer

```elixir
by_analyzer = Enum.group_by(report.issues, & &1.analyzer)

Enum.each(by_analyzer, fn {analyzer, issues} ->
  IO.puts("#{analyzer}: #{length(issues)} issues")
end)
```

### Get Refactoring Suggestions

```elixir
refactorings = 
  report.issues
  |> Enum.filter(&(&1.severity == :refactoring_opportunity))
  |> Enum.filter(&(&1.suggestion != nil))

Enum.each(refactorings, fn issue ->
  IO.puts("#{issue.message}")
  IO.puts("Suggestion: #{issue.suggestion.message}")
end)
```

### Track Timing

```elixir
{:ok, report} = Runner.run(doc, track_timing: true)

if report.timing do
  IO.puts("Analysis took #{report.timing.total_ms}ms")
end
```

## Application Configuration

Configure analyzers at the application level:

```elixir
# config/config.exs
config :metastatic, :analyzers,
  auto_register: [
    Metastatic.Analysis.UnusedVariables,
    Metastatic.Analysis.SimplifyConditional,
    Metastatic.Analysis.DeadCodeAnalyzer,
    Metastatic.Analysis.NestingDepth
  ],
  disabled: [],  # Disable specific analyzers
  config: %{
    unused_variables: %{ignore_prefix: "_"},
    nesting_depth: %{max_depth: 4},
    dead_code: %{min_confidence: :high}
  }
```

Then use:

```elixir
# Runs all registered analyzers with configured settings
{:ok, report} = Runner.run(doc)
```

## Best Practices

1. **Use Specific Analyzers When Possible** - Running fewer analyzers is faster
2. **Configure Thresholds** - Adjust defaults to match your project standards
3. **Process Issues by Severity** - Handle errors before warnings before info
4. **Cache Registry Lookups** - Don't repeatedly query the registry
5. **Use run_before/1 for Expensive Setup** - Heavy computation in lifecycle hooks
6. **Combine with Other Tools** - Use with formatter and type checker
7. **Review Suggestions** - Don't blindly apply refactoring suggestions
8. **Monitor Performance** - Use `track_timing: true` for performance-critical code

## Performance Considerations

- **Single-pass traversal:** Multiple analyzers run in a single AST traversal
- **Lazy evaluation:** Analysis only runs when explicitly called
- **Configurable depth:** Limit analysis with `max_issues` option
- **Language agnostic:** Same analyzers work across all languages

## Troubleshooting

### Analyzer Not Found

```elixir
# Register it first
Registry.register(MyAnalyzer)

# Or pass explicitly
Runner.run(doc, analyzers: [MyAnalyzer])
```

### No Issues Found

- Verify the AST contains the patterns the analyzer looks for
- Check analyzer configuration
- Run with specific analyzer to verify it's registered

### Performance Issues

- Reduce number of analyzers
- Use `max_issues` to stop early
- Profile with `track_timing: true`
- Consider language-specific analyzers if available

## Next Steps

- See CUSTOM_ANALYZER_GUIDE.md to create your own analyzers
- See BUILTIN_ANALYZERS.md for detailed reference
- Check examples/ directory for working code samples