# LlmGuard Architecture
## Overview
LlmGuard is designed as a modular, extensible security framework for LLM-based applications. The architecture follows defense-in-depth principles with multiple security layers working independently and cooperatively.
## System Architecture
```mermaid
graph TB
subgraph "Application Layer"
App[LLM Application]
end
subgraph "LlmGuard Security Layer"
API[LlmGuard API]
Config[Configuration]
Pipeline[Security Pipeline]
subgraph "Input Guardrails"
PI[Prompt Injection Detector]
JB[Jailbreak Detector]
Length[Length Validator]
Policy[Policy Engine]
end
subgraph "Output Guardrails"
DL[Data Leakage Scanner]
CS[Content Safety]
Valid[Output Validator]
end
subgraph "Supporting Services"
RL[Rate Limiter]
Audit[Audit Logger]
Cache[Pattern Cache]
end
end
subgraph "LLM Provider"
LLM[Language Model API]
end
App --> API
API --> Config
API --> Pipeline
Pipeline --> PI
Pipeline --> JB
Pipeline --> Length
Pipeline --> Policy
Pipeline --> LLM
LLM --> DL
DL --> CS
CS --> Valid
Valid --> App
Pipeline --> RL
Pipeline --> Audit
PI --> Cache
JB --> Cache
```
## Core Components
### 1. LlmGuard API
**Module**: `LlmGuard`
The main entry point providing high-level functions:
- `validate_input/2` - Validates and sanitizes user input
- `validate_output/2` - Validates LLM responses
- `validate_batch/2` - Batch processing for multiple inputs
- `async_validate_batch/2` - Asynchronous batch processing
### 2. Configuration System
**Module**: `LlmGuard.Config`
Centralized configuration management:
```elixir
%LlmGuard.Config{
# Detection toggles
prompt_injection_detection: true,
jailbreak_detection: true,
data_leakage_prevention: true,
content_moderation: true,
# Thresholds
confidence_threshold: 0.7,
max_input_length: 10_000,
# Custom detectors
custom_detectors: [],
# Rate limiting
rate_limit_config: %{},
# Audit logging
audit_enabled: true
}
```
### 3. Security Pipeline
**Module**: `LlmGuard.Pipeline`
Orchestrates execution of security checks in a defined order:
```elixir
pipeline = LlmGuard.Pipeline.new()
|> Pipeline.add_stage(:length_check, LengthValidator)
|> Pipeline.add_stage(:prompt_injection, PromptInjection)
|> Pipeline.add_stage(:jailbreak, Jailbreak)
|> Pipeline.add_stage(:policy, PolicyEngine)
```
**Features**:
- Sequential execution with early termination on failure
- Async execution for independent checks
- Error handling and recovery
- Performance monitoring
### 4. Detector Framework
**Module**: `LlmGuard.Detector` (Behaviour)
All detectors implement the `Detector` behaviour:
```elixir
defmodule LlmGuard.Detector do
@callback detect(input :: String.t(), opts :: keyword()) ::
{:safe, map()} | {:detected, map()}
end
```
**Built-in Detectors**:
- `LlmGuard.PromptInjection` - Detects prompt injection attempts
- `LlmGuard.Jailbreak` - Detects jailbreak attempts
- `LlmGuard.DataLeakage` - Scans for PII and sensitive data
- `LlmGuard.ContentSafety` - Moderates harmful content
## Detection Strategy
### Multi-Layer Detection
```mermaid
graph LR
Input[User Input] --> L1[Layer 1: Pattern Matching]
L1 --> L2[Layer 2: Heuristic Analysis]
L2 --> L3[Layer 3: ML Classification]
L3 --> Decision{Safe?}
Decision -->|Yes| Allow[Allow]
Decision -->|No| Block[Block]
```
### Pattern Matching (Layer 1)
Fast, rule-based detection using regex and string matching:
- Known malicious patterns
- Signature-based detection
- Low latency (~1ms)
### Heuristic Analysis (Layer 2)
Statistical and linguistic analysis:
- Entropy analysis
- Token frequency analysis
- Structural anomaly detection
- Medium latency (~10ms)
### ML Classification (Layer 3)
Machine learning-based detection:
- Transformer-based embeddings
- Fine-tuned classifiers
- Ensemble methods
- Higher latency (~50-100ms)
## Data Flow
### Input Validation Flow
```mermaid
sequenceDiagram
participant App
participant LlmGuard
participant Pipeline
participant Detectors
participant Audit
participant LLM
App->>LlmGuard: validate_input(prompt)
LlmGuard->>Pipeline: run(prompt, config)
loop For each detector
Pipeline->>Detectors: detect(prompt)
Detectors-->>Pipeline: result
end
Pipeline->>Audit: log_event(result)
alt All checks pass
Pipeline-->>LlmGuard: {:ok, sanitized}
LlmGuard-->>App: {:ok, sanitized}
App->>LLM: call(sanitized)
else Any check fails
Pipeline-->>LlmGuard: {:error, reason}
LlmGuard-->>App: {:error, reason}
end
```
### Output Validation Flow
```mermaid
sequenceDiagram
participant LLM
participant App
participant LlmGuard
participant Scanner
participant Sanitizer
participant Audit
LLM->>App: response
App->>LlmGuard: validate_output(response)
LlmGuard->>Scanner: scan_for_pii(response)
Scanner-->>LlmGuard: detected_entities
alt PII detected
LlmGuard->>Sanitizer: mask(response, entities)
Sanitizer-->>LlmGuard: masked_response
end
LlmGuard->>Audit: log_scan(result)
LlmGuard-->>App: {:ok, safe_response}
```
## Policy Engine
### Policy Structure
```elixir
%LlmGuard.Policy{
name: "production_policy",
rules: [
%Rule{
id: :no_system_prompts,
type: :input,
validator: fn input -> ... end,
severity: :high
},
%Rule{
id: :max_length,
type: :input,
validator: fn input -> ... end,
severity: :medium
}
],
actions: %{
high: :block,
medium: :warn,
low: :log
}
}
```
### Policy Evaluation
```mermaid
graph TD
Input[Input] --> Eval[Evaluate All Rules]
Eval --> Check{All Pass?}
Check -->|Yes| Allow[Allow]
Check -->|No| Severity{Max Severity}
Severity -->|High| Block[Block]
Severity -->|Medium| Warn[Warn & Allow]
Severity -->|Low| Log[Log & Allow]
```
## Rate Limiting
### Token Bucket Algorithm
```elixir
%RateLimiter{
user_id: "user123",
buckets: %{
requests: %{capacity: 60, tokens: 60, refill_rate: 1/s},
tokens: %{capacity: 100_000, tokens: 100_000, refill_rate: 1667/s}
},
last_refill: ~U[2024-01-01 12:00:00Z]
}
```
**Features**:
- Per-user rate limiting
- Multiple bucket types (requests, tokens)
- Distributed rate limiting support (via Redis/ETS)
- Graceful degradation
## Audit Logging
### Event Structure
```elixir
%AuditEvent{
id: UUID,
timestamp: DateTime,
event_type: :prompt_injection_detected,
user_id: "user123",
session_id: "session456",
severity: :high,
action: :blocked,
metadata: %{
input: "...",
detector: LlmGuard.PromptInjection,
confidence: 0.95,
patterns_matched: ["ignore previous instructions"]
}
}
```
### Storage Backends
- **ETS** - In-memory, fast (default)
- **Database** - PostgreSQL, MySQL (via Ecto)
- **External** - Elasticsearch, Splunk (via adapters)
## Performance Optimization
### Caching Strategy
```mermaid
graph LR
Input[Input] --> Hash[Hash Input]
Hash --> Cache{In Cache?}
Cache -->|Hit| Return[Return Cached Result]
Cache -->|Miss| Detect[Run Detection]
Detect --> Store[Store in Cache]
Store --> Return
```
**Cache Levels**:
1. **Pattern Cache** - Compiled regex patterns
2. **Result Cache** - Detection results (with TTL)
3. **Embedding Cache** - ML embeddings
### Async Processing
```elixir
# Parallel detection
tasks = detectors
|> Enum.map(fn detector ->
Task.async(fn -> detector.detect(input) end)
end)
|> Task.await_many()
```
### Streaming Support
For large inputs, support streaming validation:
```elixir
LlmGuard.stream_validate(input_stream, config)
|> Stream.map(&process_chunk/1)
|> Enum.to_list()
```
## Extensibility
### Custom Detectors
```elixir
defmodule MyApp.CustomDetector do
@behaviour LlmGuard.Detector
@impl true
def detect(input, opts) do
# Custom detection logic
end
end
config = LlmGuard.Config.new()
|> LlmGuard.Config.add_detector(MyApp.CustomDetector)
```
### Plugin System
Future enhancement for third-party plugins:
```elixir
LlmGuard.Plugin.register(MyPlugin, %{
detector: MyPlugin.Detector,
config: %{},
priority: 10
})
```
## Deployment Considerations
### Standalone Mode
LlmGuard runs within the application process:
```elixir
# In application supervision tree
children = [
{LlmGuard.Supervisor, config}
]
```
### Distributed Mode
LlmGuard can run as a separate service:
```mermaid
graph LR
App1[App Instance 1] --> EG[LlmGuard Service]
App2[App Instance 2] --> EG
App3[App Instance 3] --> EG
EG --> Cache[Shared Cache]
EG --> DB[Audit DB]
```
### Scaling Strategy
- **Horizontal**: Multiple LlmGuard instances with shared cache
- **Vertical**: Increase detector parallelism
- **Edge**: Deploy detectors closer to users for lower latency
## Security Guarantees
1. **Defense in Depth**: Multiple independent detection layers
2. **Fail Secure**: Block on uncertainty
3. **Zero Trust**: Validate all inputs and outputs
4. **Audit Trail**: Complete logging for forensics
5. **Performance**: <50ms p95 latency for most detections
## Future Enhancements
1. **Federated Learning**: Collaborative model training
2. **Real-time Updates**: Live threat intelligence integration
3. **Advanced Analytics**: ML-powered anomaly detection
4. **Multi-modal**: Support for image/audio inputs
5. **Privacy Preserving**: Homomorphic encryption for sensitive data