# Snakepit Configuration Guide
This guide covers all configuration options for Snakepit, from simple single-pool setups to advanced multi-pool deployments with different worker profiles.
---
## Table of Contents
1. [Configuration Formats](#configuration-formats)
2. [Global Options](#global-options)
3. [Pool Configuration](#pool-configuration)
4. [Heartbeat Configuration](#heartbeat-configuration)
5. [Logging Configuration](#logging-configuration)
6. [Python Runtime Configuration](#python-runtime-configuration)
7. [Optional Features](#optional-features)
8. [Complete Configuration Example](#complete-configuration-example)
---
## Configuration Formats
Snakepit supports two configuration formats: legacy (single-pool) and multi-pool (v0.6+).
### Simple (Legacy) Configuration
For backward compatibility with v0.5.x and single-pool deployments:
```elixir
# config/config.exs
config :snakepit,
pooling_enabled: true,
adapter_module: Snakepit.Adapters.GRPCPython,
pool_size: 100,
pool_config: %{
startup_batch_size: 8,
startup_batch_delay_ms: 750,
max_workers: 1000
}
```
This format creates a single pool named `:default` with the specified settings.
### Multi-Pool Configuration (v0.6+)
For advanced deployments with multiple pools, each with different profiles:
```elixir
# config/config.exs
config :snakepit,
pools: [
%{
name: :default,
worker_profile: :process,
pool_size: 100,
adapter_module: Snakepit.Adapters.GRPCPython
},
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 16,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"]
}
]
```
This creates two pools: `:default` for general tasks and `:ml_inference` for CPU-bound ML workloads.
---
## Global Options
These options apply to all pools or the Snakepit application as a whole.
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `pooling_enabled` | `boolean()` | `false` | Enable or disable worker pooling. Set to `true` for normal operation. |
| `adapter_module` | `module()` | `nil` | Default adapter module for pools that do not specify one. |
| `pool_size` | `pos_integer()` | `System.schedulers_online() * 2` | Default pool size. Typically 2x CPU cores. |
| `capacity_strategy` | `:pool \| :profile \| :hybrid` | `:pool` | How worker capacity is managed across pools. |
| `pool_startup_timeout` | `pos_integer()` | `10000` | Maximum time (ms) to wait for a worker to start. |
| `pool_queue_timeout` | `pos_integer()` | `5000` | Maximum time (ms) a request waits in queue. |
| `pool_max_queue_size` | `pos_integer()` | `1000` | Maximum queued requests before rejecting new ones. |
| `grpc_port` | `pos_integer()` | `50051` | Port for the Elixir gRPC server (Python-to-Elixir calls). |
| `grpc_host` | `String.t()` | `"localhost"` | Host for gRPC connections. |
| `graceful_shutdown_timeout_ms` | `pos_integer()` | `6000` | Time (ms) to wait for Python to terminate gracefully before SIGKILL. |
### Capacity Strategies
| Strategy | Description |
|----------|-------------|
| `:pool` | Each pool manages its own capacity independently. Default and simplest option. |
| `:profile` | Workers of the same profile share capacity across pools. |
| `:hybrid` | Combination of pool and profile strategies for complex deployments. |
---
## Pool Configuration
Each pool can be configured independently with these options.
### Required Fields
| Option | Type | Description |
|--------|------|-------------|
| `name` | `atom()` | Unique pool identifier. Use `:default` for the primary pool. |
### Profile Selection
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `worker_profile` | `:process \| :thread` | `:process` | Worker execution model. See [Worker Profiles Guide](worker-profiles.md). |
### Common Pool Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `pool_size` | `pos_integer()` | Global setting | Number of workers in this pool. |
| `adapter_module` | `module()` | Global setting | Adapter module for this pool. |
| `adapter_args` | `list(String.t())` | `[]` | CLI arguments passed to the Python server. |
| `adapter_env` | `list({String.t(), String.t()})` | `[]` | Environment variables for Python processes. |
| `adapter_spec` | `String.t()` | `nil` | Python adapter module path (e.g., `"myapp.adapters.MyAdapter"`). |
### Process Profile Options
These options apply when `worker_profile: :process`:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `startup_batch_size` | `pos_integer()` | `8` | Workers started per batch during pool initialization. |
| `startup_batch_delay_ms` | `non_neg_integer()` | `750` | Delay between startup batches (ms). Reduces system load during startup. |
### Thread Profile Options
These options apply when `worker_profile: :thread`:
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `threads_per_worker` | `pos_integer()` | `10` | Thread pool size per Python process. Total capacity = `pool_size * threads_per_worker`. |
| `thread_safety_checks` | `boolean()` | `false` | Enable runtime thread safety validation. Useful for development. |
### Worker Lifecycle Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `worker_ttl` | `:infinity \| {value, unit}` | `:infinity` | Maximum worker lifetime before recycling. |
| `worker_max_requests` | `:infinity \| pos_integer()` | `:infinity` | Maximum requests before recycling a worker. |
**TTL Units:**
| Unit | Example |
|------|---------|
| `:seconds` | `{3600, :seconds}` - 1 hour |
| `:minutes` | `{60, :minutes}` - 1 hour |
| `:hours` | `{1, :hours}` - 1 hour |
Worker recycling helps prevent memory leaks and ensures fresh worker state.
---
## Heartbeat Configuration
Heartbeats detect unresponsive workers and trigger automatic restarts.
### Global Heartbeat Config
```elixir
config :snakepit,
heartbeat: %{
enabled: true,
ping_interval_ms: 2000,
timeout_ms: 10000,
max_missed_heartbeats: 3,
initial_delay_ms: 0,
dependent: true
}
```
### Per-Pool Heartbeat Config
```elixir
%{
name: :ml_pool,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 30000,
max_missed_heartbeats: 2
}
}
```
### Heartbeat Options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `enabled` | `boolean()` | `true` | Enable heartbeat monitoring. |
| `ping_interval_ms` | `pos_integer()` | `2000` | Interval between heartbeat pings. |
| `timeout_ms` | `pos_integer()` | `10000` | Maximum time to wait for heartbeat response. |
| `max_missed_heartbeats` | `pos_integer()` | `3` | Missed heartbeats before declaring worker dead. |
| `initial_delay_ms` | `non_neg_integer()` | `0` | Delay before first heartbeat ping. |
| `dependent` | `boolean()` | `true` | Whether worker terminates if heartbeat monitor dies. |
### Tuning Guidelines
- **Fast detection**: Lower `ping_interval_ms` and `max_missed_heartbeats`
- **Reduce overhead**: Higher `ping_interval_ms` for stable workloads
- **Long operations**: Increase `timeout_ms` if workers run long computations
- **ML workloads**: Use `ping_interval_ms: 10000` or higher since inference can block
---
## Logging Configuration
Snakepit uses its own logger for internal operations.
### Log Level
```elixir
config :snakepit,
log_level: :info # :debug | :info | :warning | :error | :none
```
| Level | Description |
|-------|-------------|
| `:debug` | Verbose output including worker lifecycle, gRPC calls, heartbeats |
| `:info` | Normal operation messages |
| `:warning` | Potential issues that do not stop operation |
| `:error` | Errors that affect functionality |
| `:none` | Disable all Snakepit logging |
### Log Categories
Fine-grained control over logging categories:
```elixir
config :snakepit,
log_level: :info,
log_categories: %{
pool: :debug, # Pool operations
worker: :debug, # Worker lifecycle
heartbeat: :info, # Heartbeat monitoring
grpc: :warning # gRPC communication
}
```
### Python-Side Logging
The Python bridge respects the `SNAKEPIT_LOG_LEVEL` environment variable:
```elixir
%{
name: :default,
adapter_env: [{"SNAKEPIT_LOG_LEVEL", "info"}]
}
```
---
## Python Runtime Configuration
Configure how Python interpreters are discovered and managed.
### Interpreter Selection
```elixir
config :snakepit,
python_executable: "/path/to/python3"
```
Or use environment variable (takes precedence):
```bash
export SNAKEPIT_PYTHON="/path/to/python3"
```
### Runtime Strategy
```elixir
config :snakepit,
python_runtime: %{
strategy: :venv, # :system | :venv | :managed
managed: false,
version: "3.12"
}
```
| Strategy | Description |
|----------|-------------|
| `:system` | Use system Python interpreter |
| `:venv` | Use project virtual environment (`.venv/bin/python3`) |
| `:managed` | Let Snakepit manage Python version (experimental) |
### Environment Variables per Pool
```elixir
%{
name: :ml_pool,
adapter_env: [
# Control threading in numerical libraries
{"OPENBLAS_NUM_THREADS", "1"},
{"MKL_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"},
{"NUMEXPR_NUM_THREADS", "1"},
# GPU configuration
{"CUDA_VISIBLE_DEVICES", "0"},
# Python settings
{"PYTHONUNBUFFERED", "1"},
{"SNAKEPIT_LOG_LEVEL", "warning"}
]
}
```
---
## Optional Features
### Zero-Copy Data Transfer
Enable zero-copy for large binary data:
```elixir
config :snakepit,
zero_copy: %{
enabled: true,
threshold_bytes: 1_048_576 # 1 MB
}
```
Zero-copy is beneficial for ML workloads with large tensors.
### Crash Barrier
Limit restart attempts for frequently crashing workers:
```elixir
config :snakepit,
crash_barrier: %{
enabled: true,
max_restarts: 5,
window_seconds: 60
}
```
If a worker restarts more than `max_restarts` times within `window_seconds`, it is permanently removed from the pool.
### Circuit Breaker
Prevent cascading failures:
```elixir
config :snakepit,
circuit_breaker: %{
enabled: true,
failure_threshold: 5,
reset_timeout_ms: 30000
}
```
After `failure_threshold` consecutive failures, the circuit opens and requests fail fast for `reset_timeout_ms`.
---
## Complete Configuration Example
Here is a production-ready configuration demonstrating all major options:
```elixir
# config/config.exs
config :snakepit,
# Global settings
pooling_enabled: true,
pool_startup_timeout: 30_000,
pool_queue_timeout: 10_000,
pool_max_queue_size: 5000,
grpc_port: 50051,
# Logging
log_level: :info,
log_categories: %{
pool: :info,
worker: :warning,
heartbeat: :warning,
grpc: :warning
},
# Global heartbeat defaults
heartbeat: %{
enabled: true,
ping_interval_ms: 5000,
timeout_ms: 15000,
max_missed_heartbeats: 3
},
# Multiple pools
pools: [
# Default pool for I/O-bound tasks
%{
name: :default,
worker_profile: :process,
pool_size: 50,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.GeneralAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "1"},
{"OMP_NUM_THREADS", "1"}
],
startup_batch_size: 10,
startup_batch_delay_ms: 500
},
# ML inference pool (CPU-bound, thread profile)
%{
name: :ml_inference,
worker_profile: :thread,
pool_size: 4,
threads_per_worker: 8, # 32 total capacity
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.ml.InferenceAdapter"],
adapter_env: [
{"OPENBLAS_NUM_THREADS", "8"},
{"OMP_NUM_THREADS", "8"},
{"CUDA_VISIBLE_DEVICES", "0"},
{"PYTORCH_CUDA_ALLOC_CONF", "max_split_size_mb:512"}
],
thread_safety_checks: false,
worker_ttl: {1800, :seconds},
worker_max_requests: 10000,
heartbeat: %{
enabled: true,
ping_interval_ms: 10000,
timeout_ms: 60000,
max_missed_heartbeats: 2
}
},
# Background processing pool
%{
name: :background,
worker_profile: :process,
pool_size: 10,
adapter_module: Snakepit.Adapters.GRPCPython,
adapter_args: ["--adapter", "myapp.adapters.BackgroundAdapter"],
adapter_env: [
{"SNAKEPIT_LOG_LEVEL", "warning"}
],
worker_ttl: {3600, :seconds}
}
],
# Optional features
crash_barrier: %{
enabled: true,
max_restarts: 10,
window_seconds: 300
}
```
### Environment-Specific Overrides
```elixir
# config/prod.exs
config :snakepit,
log_level: :warning,
pool_max_queue_size: 10000
# config/dev.exs
config :snakepit,
log_level: :debug,
pool_size: 4
# config/test.exs
config :snakepit,
pooling_enabled: false
```
---
## Validation
Verify your configuration with the doctor task:
```bash
mix snakepit.doctor
```
At runtime, check pool status:
```elixir
iex> Snakepit.get_stats()
%{
requests: 15432,
queued: 5,
errors: 12,
queue_timeouts: 3,
pool_saturated: 0,
workers: 54,
available: 49,
busy: 5
}
```
---
## Related Guides
- [Getting Started](getting-started.md) - Installation and first steps
- [Worker Profiles](worker-profiles.md) - Process vs Thread profiles
- [Production](production.md) - Performance tuning and deployment checklist