CHANGELOG.md

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.6.10] - 2025-11-13

### Added
- **Canonical worker metadata** – `Snakepit.Pool.Registry.metadata_keys/0` exposes the authoritative metadata keys (`:worker_module`, `:pool_name`, `:pool_identifier`, `:adapter_module`) and the surrounding docs call out how pool helpers, diagnostics, and worker profiles should treat that map as the single source of truth.
- **Telemetry catalog + filters** – `Snakepit.Telemetry.Naming.python_event_catalog/0` now documents the full event/measurement schema emitted by `snakepit_bridge`, while the Python telemetry stream implements glob-style allow/deny filters pushed from Elixir so noisy adapters can be muted without redeploying workers.
- **Async adapter registration** – `snakepit_bridge.base_adapter.BaseAdapter` adds `register_with_session_async/2` (plus regression coverage) so asyncio/aio stubs can advertise tool surfaces without blocking while the synchronous helper stays intact for classic stubs.
- **Self-managing Python tests** – `test_python.sh` now creates/updates `.venv`, fingerprints `priv/python/requirements.txt`, installs deps, regenerates protobuf stubs, and exports quiet OTEL defaults so `./test_python.sh` is a one-command pytest runner on any Linux/WSL host.

### Changed
- **Queue timeout enforcement** – Queued requests now carry their timer reference, the pool cancels those timers as soon as the request is dequeued or dropped, and statistics/logging happen in one place, preventing runaway timers when pools churn.
- **Threaded adapter guardrails** – `priv/python/grpc_server_threaded.py` refuses to boot adapters that don’t set `__thread_safe__ = True`, logging a clear remediation path and forcing unsafe adapters back to process mode.
- **Tool registration resilience** – `snakepit_bridge.base_adapter.BaseAdapter` wraps gRPC stub responses in `_coerce_stub_response/1`, unwrapping awaitables, `UnaryUnaryCall` structs, or lazy callables before checking `response.success`, which stabilizes adapters that mix sync and async gRPC stubs.
- **Heartbeat/schema documentation** – `Snakepit.Config` now ships typedocs for the normalized pool/heartbeat map shared with Python, and the architecture plus gRPC guides emphasize that BEAM is the authoritative heartbeat monitor with `SNAKEPIT_HEARTBEAT_CONFIG` kept in sync across languages.

### Fixed
- **Stale queue timeouts** – Queue timeout messages that arrive after a request has already been serviced are ignored, and clients now receive `{:error, :queue_timeout}` exactly once when their request is actually dropped.

## [0.6.9] - 2025-11-13

### Added
- **Registry helpers**: Introduced `Snakepit.Pool.Registry.fetch_worker/1` plus metadata helpers used throughout the pool, bridge server, worker profiles, and diagnostics so `worker_module`, `pool_identifier`, and `pool_name` are always looked up in a single, tested place.
- **Binary parameter validation**: `Snakepit.GRPC.BridgeServer` now rejects non-binary entries in `ExecuteToolRequest.binary_parameters`, guaranteeing local tools only ever see `{:binary, payload}` tuples while remote workers still receive the untouched proto map.
- **Slow-test workflow**: Tagged the long-running suites with `@tag :slow`, defaulted `mix test` to skip them, and documented the opt-in commands plus the 2025-11-13 slow-test inventory in `README_TESTING` and `docs/20251113/slow-test-report.md`.
- **Lifecycle observability**: Memory-based recycling now logs a warning whenever a worker cannot answer the `:get_memory_usage` probe, preventing silent configuration drift.
- **Rogue cleanup controls**: Operators can configure the exact script names and run-id markers that qualify Python processes for startup cleanup, with defaults matching `grpc_server.py`/`grpc_server_threaded.py`.
- **Memory recycle telemetry & diagnostics**: `[:snakepit, :worker, :recycled]` now emits `memory_mb`/`memory_threshold_mb`, Prometheus metrics expose `snakepit.worker.recycled` counters, and both `Snakepit.Diagnostics.ProfileInspector` plus `mix snakepit.profile_inspector` show per-pool “Memory Recycles” totals for operators.

### Changed
- **GRPC worker lookups**: GRPCWorker, ToolRegistry clients, pool helpers, and worker profiles call the new Registry helpers instead of `Registry.lookup/2`, ensuring metadata stays normalized and reverse lookups never crash when metadata is missing.
- **Bridge test coverage**: Added binary-parameter regression tests that prove malformed payloads are rejected before reaching Elixir tools, plus lifecycle tests that simulate failing memory probes.
- **Process killer tests**: Rogue cleanup unit tests now cover the customizable scripts/markers path so changes to the configuration surface immediately.
- **Heartbeat contract clarity**: Documented what `dependent: true|false` means, exported `SNAKEPIT_HEARTBEAT_CONFIG` expectations, and added both HeartbeatMonitor- and GRPCWorker-level regression tests so fail-fast vs independent behavior stays well defined.
- **Telemetry stream shutdown noise**: gRPC telemetry stream shutdowns that report `:normal` or `:shutdown` now log at debug level, eliminating the warning spam that buried actionable failures during slow-test runs.

### Fixed
- **Registry metadata race**: `Pool.Registry.put_metadata/2` now reports `{:error, :not_registered}` when clients attempt to attach metadata before the worker is registered and downgrades those expected attempts to debug logs, eliminating silent successes that previously returned `:ok`.
- **Heartbeat metrics stability**: The `snakepit.worker.memory_mb` summary now pulls values via `Map.get/2` and non-dependent monitors retain timeout/missed-heartbeat counters, so Telemetry/Prometheus exporters stop crashing when measurements arrive as maps and status checks reflect the real failure budget.
- **Docs parity**: README, README_GRPC, README_PROCESS_MANAGEMENT, and ARCHITECTURE now describe the binary parameter contract, registry helper usage, lifecycle behavior, and rogue cleanup assumptions introduced in this release.

## [0.6.8] - 2025-11-12

This release also rolls up the previously undocumented fail-fast docs/tests work from 074f2260f703d16ccfecf937c10af905165419f0 (heartbeat fail-fast suites, orphan cleanup stress tests, queue probe adapter, and config fail-fast coverage).

### Added
- **Bootstrap automation**: Introduced `Snakepit.Bootstrap`, `mix snakepit.setup`, and a `make bootstrap` target to install Mix deps, provision `.venv`/`.venv-py313`, install Python requirements, run `scripts/setup_test_pythons.sh`, and regenerate gRPC stubs with fully instrumented logging.
- **Environment doctor**: New `Snakepit.EnvDoctor` module plus `mix snakepit.doctor` task verify interpreter availability, `grpc` import, `.venv`/`.venv-py313`, `priv/python/grpc_server.py --health-check`, and worker port availability with actionable remediation messages.
- **Runtime guardrails**: `Snakepit.Application` now invokes `Snakepit.EnvDoctor.ensure_python!/0` before pools start, failing fast when Python prerequisites are missing. Test helpers (`test/support/fake_doctor.ex`, `test/support/bootstrap_runner.ex`, `test/support/command_runner.ex`) enable deterministic unit coverage for the bootstrap/doctor path.
- **Python-aware CI**: GitHub Actions workflow now runs bootstrap, doctor, the default suite, and `mix test --only python_integration` so bridge coverage is validated when the doctor passes.
- **New documentation**: README + README_TESTING describe the `make bootstrap → mix snakepit.doctor → mix test` workflow, explain how to run python integration tests, and highlight the new Mix tasks.
- **Lifecycle config & memory recycling**: Added `%Snakepit.Worker.LifecycleConfig{}` to capture adapter/profile/env data for every worker, wired `Snakepit.GRPCWorker` to answer `:get_memory_usage`, and extended lifecycle tests so TTL/request/memory recycling use the same canonical config.
- **Binary tool parameters**: `Snakepit.GRPC.BridgeServer`, `Snakepit.GRPC.Client`, and `Snakepit.GRPC.ClientImpl` now decode/forward `ExecuteToolRequest.binary_parameters`, exposing binaries to local tools as `{:binary, payload}` while sending the untouched map to Python workers. README.md and README_GRPC.md document the contract.
- **Worker-flow integration test**: New `Snakepit.Pool.WorkerFlowIntegrationTest` exercises the WorkerSupervisor → MockGRPCWorker path, ensuring registry/process tracking stays consistent after execution and crash/restart flows.
- **Randomized worker stress test**: `Snakepit.Pool.RandomWorkerFlowTest` throws randomized execute/kill sequences at pools to ensure Registry ↔ ProcessRegistry invariants hold under churn.

### Changed
- **Test gating**: Default `mix test` excludes `:python_integration` while Python-heavy suites (thread profile, session affinity, streaming regression, etc.) carry the tag; `test/unit/exunit_configuration_test.exs` locks the config in place.
- **Thread-profile test harness**: `Snakepit.ThreadProfilePython313Test` now uses `Snakepit.Test.PythonEnv.skip_unless_python_313/1` to skip cleanly when `.venv-py313` is unavailable.
- **Process killer regression**: Ports spawned during `kill_by_run_id/1` tests close via `safe_close_port/1`, eliminating `:port_close` race exceptions.
- **Queue saturation regression**: `Snakepit.Pool.QueueSaturationRuntimeTest` focuses on stats + agent tracking instead of brittle global ETS assertions, removing a common source of flaky failures.
- **gRPC generation script**: `priv/python/generate_grpc.sh` now prefers `.venv/bin/python3`, falling back to system `python3/python` only when the virtualenv is missing, and emits helpful logs when no interpreter is found.
- **Registry metadata semantics**: `Snakepit.GRPCWorker` now writes canonical metadata (`worker_module`, `pool_name`, `pool_identifier`) via `Snakepit.Pool.Registry.put_metadata/2`, unblocking pool-name extraction and worker-module discovery without parsing IDs. Tests cover PID→worker lookups.
- **LifecycleManager internals**: Tracking records store lifecycle structs instead of ad-hoc maps so replacement workers inherit adapter args/env, and memory thresholds now exercise the worker call path in tests.
- **Process cleanup safety**: Rogue process cleanup only targets commands containing `grpc_server.py`/`grpc_server_threaded.py` with `--snakepit-run-id/--run-id` flags, and operators can disable the sweep with `config :snakepit, :rogue_cleanup, enabled: false`. Docs explain the ownership contract.
- **Pool integration coverage**: Replaced the unstable `test/snakepit/pool/high_risk_flow_test.exs` harness with targeted unit-level integration coverage (WorkerSupervisor + MockGRPCWorker), keeping the suite reliable while still covering the critical registry/ProcessRegistry chain.
- **Worker profile metadata lookup**: Process/thread profiles now resolve worker modules via `Pool.Registry.get_worker_id_by_pid/1` + metadata lookup, so non-GRPC workers can be supported and Dialyzer warnings are gone.

### Fixed
- Shell instrumentation around bootstrap (reporting command start/finish and verbose pip output) prevents "silent hangs" and surfaced the root causes of previous provisioning confusion.
- `scripts/setup_test_pythons.sh` now runs under `set -x`, streaming its progress during bootstrap.
- Rogue cleanup tests verify we no longer kill unrelated Python processes, and docs call out the run-id requirements so multi-tenant hosts stay safe.


## [0.6.7] - 2025-10-28

### Added

#### Phase 1: Type System MVP + Performance
- **6x JSON performance boost**: Integrated `orjson` for Python serialization, delivering 4-6x speedup for raw JSON operations and 1.5x improvement for large payloads (`priv/python/snakepit_bridge/serialization.py`, `priv/python/tests/test_orjson_integration.py`).
- **Structured error type**: New `Snakepit.Error` struct provides detailed context for debugging with fields including `category`, `message`, `details`, `python_traceback`, and `grpc_status` (`lib/snakepit/error.ex`, `test/unit/error_test.exs`).
- **Complete type specifications**: All public API functions in `Snakepit` module now have `@spec` annotations with structured error return types for better IDE support and Dialyzer analysis.
- **Performance benchmarks**: Comprehensive benchmark suite validates 4-6x raw JSON speedup and verifies no regression on small payloads (`priv/python/tests/test_orjson_integration.py`).

#### Phase 2: Distributed Telemetry System
- **Bidirectional telemetry streaming**: Python workers can now emit telemetry events via gRPC that are re-emitted as Elixir `:telemetry` events for unified observability (`lib/snakepit/telemetry/grpc_stream.ex`, `priv/python/snakepit_bridge/telemetry/`).
- **Complete event catalog**: 43 telemetry events across 3 layers (Infrastructure, Python Execution, gRPC Bridge) with atom-safe event names to prevent atom table exhaustion (`lib/snakepit/telemetry/naming.ex`, `docs/20251028/telemetry/01_EVENT_CATALOG.md`).
- **Python telemetry API**: High-level Python API with `telemetry.emit()` for events and `telemetry.span()` for automatic timing, plus correlation ID propagation across the Elixir/Python boundary (`priv/python/snakepit_bridge/telemetry/__init__.py`).
- **Runtime telemetry control**: Adjust sampling rates, enable/disable telemetry, and filter events for individual workers without restarts (`lib/snakepit/telemetry/control.ex`).
- **Metadata safety**: Automatic sanitization of Python metadata to prevent atom table exhaustion from untrusted string keys (`lib/snakepit/telemetry/safe_metadata.ex`).
- **Multiple backend support**: Python telemetry supports gRPC streaming (default) and stderr backends, with extensible backend architecture (`priv/python/snakepit_bridge/telemetry/backends/`).
- **Worker lifecycle hooks**: Automatic telemetry stream registration/unregistration integrated into worker lifecycle (`lib/snakepit/grpc_worker.ex:479`, `lib/snakepit/grpc_worker.ex:783`).
- **Integration tests**: Comprehensive test suite covering event catalog, validation, sanitization, and control messages (`test/integration/telemetry_flow_test.exs`).

### Changed
- Python serialization now uses `orjson` with graceful fallback to stdlib `json` if orjson is unavailable, maintaining full backward compatibility.
- Error returns in `Snakepit.Pool` and `Snakepit` modules now use structured `Snakepit.Error` types with detailed context instead of atoms.
- `Snakepit.Pool.await_ready/2` now returns `{:error, %Snakepit.Error{category: :timeout}}` instead of `{:error, :timeout}`.
- Streaming validation errors now include adapter context in error details.
- Old `telemetry.span()` (OpenTelemetry) renamed to `telemetry.otel_span()` to avoid naming conflict with new telemetry streaming span.
- `Snakepit.Application` supervision tree now includes `Snakepit.Telemetry.GrpcStream` for managing bidirectional telemetry streams.

### Fixed
- Updated Dialyzer type specifications to match new structured error returns, reducing type warnings.
- Corrected `grpc_worker.ex` metadata fields for telemetry events (`state.stats.start_time`, `state.stats.requests`).

### Documentation
- **New `TELEMETRY.md`**: Complete user guide for the distributed telemetry system with usage examples, integration patterns for Prometheus/StatsD/OpenTelemetry, and troubleshooting guidance (320 lines).
- **Telemetry design docs**: 9 comprehensive design documents covering architecture, event catalog, Python integration, client guide, gRPC implementation, and backend architecture (`docs/20251028/telemetry/`).
- **New examples**: 5 comprehensive examples demonstrating v0.6.7 features with ~50KB of production-ready code:
  - `examples/telemetry_basic.exs` - Introduction to telemetry handlers and Python telemetry API
  - `examples/telemetry_advanced.exs` - Correlation tracking, performance monitoring, runtime control
  - `examples/telemetry_monitoring.exs` - Production monitoring patterns with real-time dashboard
  - `examples/telemetry_metrics_integration.exs` - Prometheus/StatsD integration patterns
  - `examples/structured_errors.exs` - New `Snakepit.Error` struct usage and pattern matching
- **Updated `examples/README.md`**: Comprehensive guide to all examples with clear learning paths and troubleshooting.
- Updated README.md with v0.6.7 release notes highlighting type system improvements, performance gains, and telemetry system.
- Updated mix.exs version to 0.6.7 with `TELEMETRY.md` in package files and docs extras.
- Added comprehensive test coverage for structured error types (12 new tests in `test/unit/error_test.exs`).

### Performance
- **Telemetry overhead**: <10μs per event, <1% CPU impact at 100% sampling, <0.1% CPU at 10% sampling.
- **Bounded resources**: Python telemetry queue limited to 1024 events (~100KB), with graceful degradation (drops events vs blocking).
- **Zero regression**: All 235+ existing tests pass with full backward compatibility maintained.

**Zero breaking changes**: All existing code continues to work. Telemetry is fully opt-in via standard `:telemetry.attach()` patterns.

## [0.6.6] - 2025-10-27

### Added
- Configurable session/program quotas now surface tagged errors when limits are exceeded, with regression coverage in `test/unit/bridge/session_store_test.exs`.
- Introduced a logger redaction helper so adapters and bridge code can log sensitive inputs safely (`test/unit/logger/redaction_test.exs`).

### Changed
- `Snakepit.GRPC.BridgeServer` reuses worker-owned gRPC channels and only dials a disposable connection when the worker has not yet published one; fallbacks are closed after each invocation.
- gRPC streaming helpers document and enforce the JSON-plus-metadata chunk envelope, clarifying `_metadata` and `raw_data_base64` handling.
- Worker startup handshake waits for the negotiated gRPC port before publishing worker metadata, eliminating transient routing failures during boot.
- `Snakepit.GRPC.ClientImpl` now returns structured `{:error, {:invalid_parameter, :json_encode_failed, message}}` tuples when parameters cannot be JSON-encoded, preventing calling processes from crashing (`test/unit/grpc/client_impl_test.exs`).
- `Snakepit.GRPC.BridgeServer.execute_streaming_tool/2` raises `UNIMPLEMENTED` with remediation guidance so callers can fall back gracefully when streaming is disabled (`test/snakepit/grpc/bridge_server_test.exs`).

### Fixed
- `Snakepit.GRPCWorker` persists the OS-assigned port discovered during startup so BridgeServer never receives `0` when routing requests (`test/unit/grpc/grpc_worker_ephemeral_port_test.exs`).
- Parameter decoding now rejects malformed protobuf payloads with descriptive `{:invalid_parameter, key, reason}` errors, preventing unexpected crashes (`test/snakepit/grpc/bridge_server_test.exs`).
- Process registry ETS tables are `:protected` and DETS handles remain private, guarding against external mutation attempts (`test/unit/pool/process_registry_security_test.exs`).
- Pool name inference prefers registry metadata and logs once when falling back to worker-id parsing, eliminating silent misroutes (`test/unit/pool/pool_registry_lookup_test.exs`).

### Documentation
- Refreshed README, gRPC guides (including the streaming and quick reference docs), and testing notes to cover port persistence, channel reuse, quota enforcement, DETS/ETS protections, streaming payload envelopes and fallbacks, metadata-driven pool routing, logging redaction guardrails, and the expanded regression suite.

## [0.6.5] - 2025-10-26

### Added
- Regression suites covering worker supervisor stop/restart flows and profile-level shutdown helpers (`test/unit/pool/worker_supervisor_test.exs`, `test/unit/worker_profile/worker_profile_stop_worker_test.exs`).

### Changed
- `Snakepit.Application` now reads the current environment from compile-time configuration instead of calling `Mix.env/0`, keeping OTP releases Mix-free.
- Introduced `Snakepit.PythonThreadLimits.resolve/1` to merge partial thread-limit overrides with defaults before applying environment variables.

### Fixed
- `Snakepit.Pool.WorkerSupervisor.stop_worker/1` targets worker starter supervisors and accepts either worker ids or pids, ensuring restarts actually decommission the old worker.
- `Snakepit.WorkerProfile.Process` and `Snakepit.WorkerProfile.Thread` resolve worker ids through the pool registry so lifecycle manager shutdowns succeed for pid handles.

## [0.6.4] - 2025-10-30

### Added
- Streaming regression guard in `test/snakepit/streaming_regression_test.exs` covering both success and adapter capability failures
- `examples/stream_progress_demo.exs` showcasing five timed streaming updates with rich progress output
- `test_python.sh` helper that regenerates protobuf stubs, activates the project virtualenv, wires `PYTHONPATH`, and forwards arguments to `pytest`

### Changed
- Python gRPC servers now bridge streaming iterators through an `asyncio.Queue`, yielding chunks as soon as they are produced and removing ad-hoc log files
- `Snakepit.Adapters.GRPCPython` consumes streaming chunks incrementally, decoding JSON payloads, surfacing metadata, and safeguarding callback failures
- Showcase `stream_progress` tool accepts `delay_ms` and reports elapsed timing so demos and diagnostics show meaningful pacing

### Fixed
- Eliminated burst delivery of streaming responses by ensuring each chunk is forwarded to Elixir immediately, restoring real-time feedback for `execute_stream/4`

---

## [0.6.3] - 2025-10-19

### Added
- **Dependent/Independent Heartbeat Mode** - New `dependent` configuration flag allows workers to optionally continue running when Elixir heartbeats fail, enabling debugging scenarios where Python workers should remain alive
- Environment variable-based heartbeat configuration via `SNAKEPIT_HEARTBEAT_CONFIG` for passing settings from Elixir to Python workers
- Python unit test coverage for dependent heartbeat termination behavior (`priv/python/tests/test_heartbeat_client.py`)
- CLI flags `--heartbeat-dependent` and `--heartbeat-independent` for Python gRPC server configuration

### Changed
- Default heartbeat enabled state changed from `false` to `true` for better production reliability
- `HeartbeatMonitor` now suppresses worker termination when `dependent: false` is configured, logging warnings instead
- Python `HeartbeatClient` includes default shutdown handler for dependent mode
- `Snakepit.GRPCWorker` passes heartbeat configuration to Python via environment variables
- Updated configuration tests to reflect new heartbeat defaults

### Fixed
- Heartbeat configuration now properly propagates from Elixir to Python across all code paths

---

## [0.6.2] - 2025-10-26

### Added
- End-to-end heartbeat regression suite covering monitor boot, timeout handling, and OS-level process cleanup (`test/snakepit/grpc/heartbeat_end_to_end_test.exs`)
- Long-running heartbeat stability test to guard against drift and missed ping accumulation (`test/snakepit/heartbeat_monitor_test.exs`)
- Python-side telemetry regression ensuring outbound metadata preserves correlation identifiers (`priv/python/tests/test_telemetry.py`)
- Deep-dive documentation for the heartbeat and observability stack plus consolidated testing command guide (`docs/20251019/*.md`)

### Changed
- `Snakepit.GRPCWorker` now terminates itself whenever the heartbeat monitor exits, preventing pools from keeping unhealthy workers alive
- `make test` preferentially uses the repository’s virtualenv interpreter, exports `PYTHONPATH`, and runs `mix test --color` for consistent local runs

### Fixed
- Guard against leaking heartbeat monitors by stopping the worker when the monitor crashes, ensuring registry entries and OS ports are released

---

## [0.6.1] - 2025-10-19

### Added
- Proactive worker heartbeat monitoring via `Snakepit.HeartbeatMonitor` with configurable cadence, miss thresholds, and per-pool overrides
- Comprehensive telemetry stack: `Snakepit.Telemetry.OpenTelemetry` boot hook, `Snakepit.TelemetryMetrics` Prometheus exporter, and correlation helpers for tracing spans
- Rich gRPC client utilities (`Snakepit.GRPC.ClientImpl`) covering ping, session lifecycle, heartbeats, and streaming tooling
- Python bridge instrumentation (`snakepit_bridge.heartbeat`, `snakepit_bridge.telemetry`) plus new unit tests for telemetry and threaded servers
- Default telemetry/heartbeat configuration shipped in `config/config.exs`, including OTLP environment toggles and Prometheus port selection
- Configurable logging system via the new `Snakepit.Logger` module with centralized control over verbosity (`:debug`, `:info`, `:warning`, `:error`, `:none`)

### Changed
- `Snakepit.GRPCWorker` now emits detailed telemetry, manages heartbeats, and wires correlation IDs through tracing spans
- `Snakepit.Application` activates OTLP exporters based on environment variables, registers telemetry reporters alongside pool supervisors, and routes logs through `Snakepit.Logger`
- Python gRPC servers (`grpc_server.py`, `grpc_server_threaded.py`) updated with structured logging, execution metrics, and heartbeat responses
- Examples refreshed with observability storylines, dual-mode telemetry demos, and cleaner default output through `Snakepit.Logger`
- GitHub workflows tightened to reflect new test layout and planning artifacts
- 25+ Elixir modules migrated to `Snakepit.Logger` for consistent log suppression in demos and production

### Configuration
- New `:log_level` option under the `:snakepit` application config to control internal logging
  ```elixir
  # config/config.exs
  config :snakepit,
    log_level: :warning  # Options: :debug, :info, :warning, :error, :none
  ```

### Fixed
- Hardened CI skips for `ApplicationCleanupTest` to avoid nondeterministic BEAM run IDs
- Addressed flaky test ordering through targeted cleanup helpers and telemetry-aware assertions

### Documentation
- Major rewrite of `ARCHITECTURE.md`, new `AGENTS.md`, and comprehensive design dossiers for v0.7/v0.8 feature tracks
- Added heartbeat, telemetry, and OTLP upgrade plans under `docs/2025101x/`
- README refreshed with v0.6.1 highlights, logging guidance, installation tips, and observability walkthroughs

### Notes
- Existing configurations continue to work with the default `:info` log level
- Log suppression is optional—set `log_level: :debug` to restore verbose output
- Provides cleaner logs for production deployments and demos while retaining full visibility for debugging

---

## [0.6.0] - 2025-10-11

### Added - Phase 1: Dual-Mode Architecture Foundation

- **Worker Profile System**
  - New `Snakepit.WorkerProfile` behaviour for pluggable parallelism strategies
  - `Snakepit.WorkerProfile.Process` - Multi-process profile (default, backward compatible)
  - `Snakepit.WorkerProfile.Thread` - Multi-threaded profile stub (Phase 2-3 implementation)
  - Profile abstraction enables switching between process and thread execution modes

- **Python Environment Detection**
  - New `Snakepit.PythonVersion` module for Python version detection
  - Automatic detection of Python 3.13+ free-threading support (PEP 703)
  - Profile recommendation based on Python capabilities
  - Version validation and compatibility warnings

- **Library Compatibility Matrix**
  - New `Snakepit.Compatibility` module with thread-safety database
  - Compatibility tracking for 20+ popular Python libraries (NumPy, PyTorch, Pandas, etc.)
  - Per-library thread safety status, recommendations, and workarounds
  - Automatic compatibility checking for thread profile configurations

- **Configuration System Enhancements**
  - New `Snakepit.Config` module for multi-pool configuration management
  - Support for named pools with different worker profiles
  - Backward-compatible legacy configuration conversion
  - Comprehensive configuration validation and normalization
  - Profile-specific defaults (process vs thread)

- **Documentation**
  - Comprehensive v0.6.0 technical plan (8,000+ words)
  - GIL removal research and dual-mode architecture design
  - Phase-by-phase implementation roadmap (10 weeks)
  - Performance benchmarks and migration strategies

### Changed

- **Architecture Evolution**
  - Foundation laid for Python 3.13+ free-threading support
  - Worker management abstracted to support multiple parallelism models
  - Configuration system generalized for multi-pool scenarios

### Added - Phase 2: Multi-Threaded Python Worker

- **Threaded gRPC Server**
  - New `grpc_server_threaded.py` - Multi-threaded server with ThreadPoolExecutor
  - Concurrent request handling via HTTP/2 multiplexing
  - Thread safety monitoring with `ThreadSafetyMonitor` class
  - Request tracking per thread with performance metrics
  - Automatic adapter thread safety validation on startup
  - Configurable thread pool size (--max-workers parameter)

- **Thread-Safe Adapter Infrastructure**
  - New `base_adapter_threaded.py` - Base class for thread-safe adapters
  - `ThreadSafeAdapter` with built-in locking primitives
  - `ThreadLocalStorage` manager for per-thread state
  - `RequestTracker` for monitoring concurrent requests
  - `@thread_safe_method` decorator for automatic tracking
  - Context managers for safe lock acquisition
  - Built-in statistics and performance monitoring

- **Example Implementations**
  - `threaded_showcase.py` - Comprehensive thread-safe adapter example
  - Pattern 1: Shared read-only resources (models, configurations)
  - Pattern 2: Thread-local storage (caches, buffers)
  - Pattern 3: Locked shared mutable state (counters, logs)
  - CPU-intensive workloads with NumPy integration
  - Stress testing and performance monitoring tools
  - Example tools: compute_intensive, matrix_multiply, batch_process, stress_test

- **Thread Safety Validation**
  - New `thread_safety_checker.py` - Runtime validation toolkit
  - Concurrent access detection with detailed warnings
  - Known unsafe library detection (Pandas, Matplotlib, SQLite3)
  - Thread contention monitoring and analysis
  - Performance profiling per thread
  - Automatic recommendations for detected issues
  - Global checker with strict mode option

- **Documentation**
  - New `README_THREADING.md` - Comprehensive threading guide
  - Thread safety patterns and best practices
  - Writing thread-safe adapters tutorial
  - Testing strategies for concurrent code
  - Performance optimization techniques
  - Library compatibility matrix (20+ libraries)
  - Common pitfalls and solutions
  - Advanced topics: worker recycling, monitoring, debugging

### Added - Phase 3: Elixir Thread Profile Integration

- **Complete ThreadProfile Implementation**
  - Full implementation of `Snakepit.WorkerProfile.Thread`
  - Worker capacity tracking via ETS table (`:snakepit_worker_capacity`)
  - Atomic load increment/decrement for thread-safe capacity management
  - Support for concurrent requests to same worker (HTTP/2 multiplexing)
  - Automatic script selection (threaded vs standard gRPC server)

- **Worker Capacity Management**
  - ETS-based capacity tracking: `{worker_pid, capacity, current_load}`
  - Atomic operations for thread-safe load updates
  - Capacity checking before request execution
  - Automatic load decrement after request completion (even on error)
  - Real-time capacity monitoring via `get_capacity/1` and `get_load/1`

- **Adapter Configuration Enhancement**
  - Updated `GRPCPython.script_path/0` to select correct server variant
  - Automatic detection of threaded mode from adapter args
  - Seamless switching between process and thread servers
  - Enhanced argument merging for user customization

- **Load Balancing**
  - Capacity-aware worker selection
  - Prevents over-subscription of workers
  - Returns `:worker_at_capacity` when no slots available
  - Automatic queueing handled by pool layer

- **Example Demonstration**
  - New `examples/threaded_profile_demo.exs` - Interactive demo script
  - Shows configuration patterns for threaded mode
  - Explains concurrent request handling
  - Demonstrates capacity management
  - Performance monitoring examples

### Added - Phase 4: Worker Lifecycle Management

- **LifecycleManager GenServer**
  - New `Snakepit.Worker.LifecycleManager` - Automatic worker recycling
  - TTL-based recycling (configurable: seconds/minutes/hours/days)
  - Request-count based recycling (recycle after N requests)
  - Memory threshold recycling (optional, requires worker support)
  - Periodic health checks (every 5 minutes)
  - Graceful worker replacement with zero downtime

- **Worker Tracking Infrastructure**
  - Automatic worker registration on startup
  - Per-worker metadata tracking (start time, request count, config)
  - Process monitoring for crash detection
  - Lifecycle statistics and reporting

- **Recycling Logic**
  - Configurable TTL: `{3600, :seconds}`, `{1, :hours}`, etc.
  - Max requests: `worker_max_requests: 1000`
  - Memory threshold: `memory_threshold_mb: 2048` (optional)
  - Manual recycling: `LifecycleManager.recycle_worker(pool, worker_id)`
  - Automatic replacement after recycling

- **Request Counting**
  - Automatic increment after successful request
  - Per-worker request tracking
  - Triggers recycling at configured threshold
  - Integrated with Pool's execute path

- **Telemetry Events**
  - `[:snakepit, :worker, :recycled]` - Worker recycled with reason
  - `[:snakepit, :worker, :health_check_failed]` - Health check failure
  - Rich metadata (worker_id, pool, reason, uptime, request_count)
  - Integration with Prometheus, LiveDashboard, custom monitors

- **Documentation**
  - New `docs/telemetry_events.md` - Complete telemetry reference
  - Event schemas and metadata descriptions
  - Usage examples for monitoring systems
  - Prometheus and LiveDashboard integration patterns
  - Best practices and debugging tips

- **Supervisor Integration**
  - LifecycleManager added to application supervision tree
  - Positioned after WorkerSupervisor, before Pool
  - Automatic startup with pooling enabled
  - Clean shutdown handling

### Changed - Phase 4

- **GRPCWorker Enhanced**
  - Workers now register with LifecycleManager on startup
  - Lifecycle config passed during initialization
  - Untracking on worker shutdown

- **Pool Enhanced**
  - Request counting integrated into execute path
  - Automatic notification to LifecycleManager on success
  - Supports lifecycle management without modifications to existing flow

### Added - Phase 5: Enhanced Diagnostics and Monitoring

- **ProfileInspector Module**
  - New `Snakepit.Diagnostics.ProfileInspector` - Programmatic pool inspection
  - Functions for pool statistics, capacity analysis, and memory usage
  - Profile-aware metrics for both process and thread pools
  - `get_pool_stats/1` - Comprehensive pool statistics
  - `get_capacity_stats/1` - Capacity utilization and thread info
  - `get_memory_stats/1` - Memory usage breakdown per worker
  - `get_comprehensive_report/0` - All pools analysis
  - `check_saturation/2` - Capacity warning system
  - `get_recommendations/1` - Intelligent optimization suggestions

- **Mix Task: Profile Inspector**
  - New `mix snakepit.profile_inspector` - Interactive pool inspection tool
  - Text and JSON output formats
  - Detailed per-worker statistics with `--detailed` flag
  - Pool-specific inspection with `--pool` option
  - Optimization recommendations with `--recommendations` flag
  - Color-coded utilization indicators (🔴🟡🟢⚪)
  - Profile-specific insights (process vs thread)

- **Enhanced Scaling Diagnostics**
  - Extended `mix diagnose.scaling` with profile-aware analysis
  - New TEST 0: Pool Profile Analysis
  - Thread pool vs process pool comparison
  - Capacity utilization monitoring
  - Profile-specific recommendations
  - System-wide optimization opportunities
  - Real-time pool statistics integration

- **Telemetry Events**
  - `[:snakepit, :pool, :saturated]` - Pool queue at max capacity
    - Measurements: `queue_size`, `max_queue_size`
    - Metadata: `pool`, `available_workers`, `busy_workers`
  - `[:snakepit, :pool, :capacity_reached]` - Worker reached capacity (thread profile)
    - Measurements: `capacity`, `load`
    - Metadata: `worker_pid`, `profile`, `rejected` (optional)
  - `[:snakepit, :request, :executed]` - Request completed with duration
    - Measurements: `duration_us` (microseconds)
    - Metadata: `pool`, `worker_id`, `command`, `success`

- **Diagnostic Features**
  - Worker memory usage tracking per process
  - Thread pool utilization analysis
  - Capacity saturation warnings
  - Profile-appropriate recommendations
  - Performance duration tracking
  - Queue depth monitoring

### Status

- **Phase 1** ✅ Complete - Foundation modules and behaviors defined
- **Phase 2** ✅ Complete - Multi-threaded Python worker implementation
- **Phase 3** ✅ Complete - Elixir thread profile integration
- **Phase 4** ✅ Complete - Worker lifecycle management and recycling
- **Phase 5** ✅ Complete - Enhanced diagnostics and monitoring
- **Phase 6** 🔄 Pending - Documentation and examples

### Notes

- **No Breaking Changes**: All v0.5.1 configurations remain fully compatible
- **Thread Profile**: Stub implementation (returns `:not_implemented`) until Phase 2-3
- **Default Behavior**: Process profile remains default for maximum stability
- **Python 3.13+**: Free-threading support enables true multi-threaded workers
- **Migration**: Existing code requires zero changes to continue working

---

## [0.5.1] - 2025-10-11

### Added
- **Diagnostic Tools**
  - New `mix diagnose.scaling` task for comprehensive bottleneck analysis
  - Captures resource metrics (ports, processes, TCP connections, memory usage)
  - Enhanced error logging with port buffer drainage

- **Configuration Enhancements**
  - Explicit gRPC port range constraint documentation and validation
  - Batched worker startup configuration (`startup_batch_size: 8`, `startup_batch_delay_ms: 750`)
  - Resource limit safeguards with `max_workers: 1000` hard limit

### Changed
- **Worker Pool Scaling Improvements**
  - Pool now reliably scales to 250+ workers (previously limited to ~105)
  - Resolved thread explosion during concurrent startup (fixed "fork bomb" issue)
  - Dynamic port allocation using OS-assigned ports (port=0) eliminates port collision races
  - Batched worker startup prevents system resource exhaustion during concurrent initialization

- **Performance Optimizations**
  - Aggressive thread limiting via environment variables for optimal pool-level parallelism:
    - `OPENBLAS_NUM_THREADS=1` (numpy/scipy)
    - `OMP_NUM_THREADS=1` (OpenMP)
    - `MKL_NUM_THREADS=1` (Intel MKL)
    - `NUMEXPR_NUM_THREADS=1` (NumExpr)
    - `GRPC_POLL_STRATEGY=poll` (single-threaded)
  - Increased GRPC server connection backlog to 512
  - Extended worker ready timeout to 30s for large pools

- **Configuration Updates**
  - Increased `port_range` to 1000 (accommodates `max_workers`)
  - Enhanced configuration comments explaining each tuning parameter
  - Resource usage tracking during pool initialization

### Fixed
- **Concurrent Startup Issues**
  - Fixed "Cannot fork" / EAGAIN errors from thread explosion during worker spawn
  - Eliminated port collision races with dynamic port allocation
  - Resolved fork bomb caused by Python scientific libraries spawning excessive threads (6,000+ threads from OpenBLAS, gRPC, MKL)

- **Resource Management**
  - Better port binding error handling in Python gRPC server
  - Improved error diagnostics during pool initialization
  - Enhanced connection management in GRPC server

### Performance
- Successfully tested with 250 workers (2.5x previous limit)
- Startup time increases with pool size (~60s for 250 workers vs ~10s for 100 workers)
- Eliminated port collision races and fork resource exhaustion
- Dynamic port allocation provides reliable scaling

### Notes
- Thread limiting optimizes for high concurrency with many small tasks
- CPU-intensive workloads that perform heavy numerical computation within a single task may need different threading configuration
- For computationally intensive per-task workloads, consider:
  - Workload-specific environment variables passed per task
  - Separate worker pools with different threading profiles
  - Dynamic thread limit adjustment based on task type
  - Allowing higher OpenBLAS threads but reducing max_workers accordingly
- See commit dc67572 for detailed technical analysis and future considerations

---

## [0.5.0] - 2025-10-10

### Added
- **Process Management & Lifecycle**
  - New `Snakepit.RunId` module for unique process run identification with nanosecond precision
  - New `Snakepit.ProcessKiller` module for robust OS-level process cleanup with SIGTERM/SIGKILL escalation
  - Enhanced `ProcessRegistry` with run_id tracking and improved cleanup logic
  - Added `scripts/setup_python.sh` for automated Python environment setup

- **Test Infrastructure Improvements**
  - Added comprehensive Supertester refactoring plan (SUPERTESTER_REFACTOR_PLAN.md)
  - Phase 1 foundation updates complete with TestableGenServer support
  - New `assert_eventually` helper for polling conditions without Process.sleep
  - Enhanced test documentation and baseline establishment
  - New worker lifecycle tests for process management validation
  - New application cleanup tests with run_id integration

- **Python Cleanup & Testing**
  - Created Python test infrastructure with `test_python.sh` script
  - Added comprehensive SessionContext test suite (15 tests)
  - Created Elixir integration tests for Python SessionContext (9 tests)
  - Python cleanup summary documentation (PYTHON_CLEANUP_SUMMARY.md)
  - Enhanced Python gRPC server with improved process management and signal handling

- **Documentation**
  - Phase 1 completion report with detailed test results
  - Python cleanup and testing infrastructure summary
  - Enhanced test planning and refactoring documentation
  - Added comprehensive process management design documents (robust_process_cleanup_with_run_id.md)
  - Added implementation summaries and debugging session reports
  - New production deployment checklist (PRODUCTION_DEPLOYMENT_CHECKLIST.md)
  - New example status documentation (EXAMPLE_STATUS_FINAL.md)
  - Enhanced README with new icons and improved organization
  - Added README_GRPC.md and README_BIDIRECTIONAL_TOOL_BRIDGE.md
  - Created docs/archive/ structure for historical analysis and design documents

- **Assets & Branding**
  - Added 29 new SVG icons for documentation (architecture, binary, book, bug, chart, etc.)
  - New snakepit-icon.svg for branding
  - Enhanced visual documentation throughout

### Changed
- **Process Management Improvements**
  - `ApplicationCleanup` rewritten with run_id-based cleanup strategy
  - `GRPCWorker` enhanced with run_id tracking and improved termination handling
  - `ProcessRegistry` optimized cleanup from O(n) to O(1) operations using run_id
  - Enhanced `GRPCPython` adapter with run_id support

- **Code Cleanup**
  - Removed dead Python code
  - Deleted obsolete backup files and unused modules
  - Streamlined Python SessionContext
  - Cleaned up test infrastructure and removed duplicate code
  - Archived ~60 historical documentation files to docs/archive/

- **Examples Refactoring**
  - Simplified grpc_streaming_demo.exs
  - Refactored grpc_advanced.exs for better clarity
  - Enhanced grpc_sessions.exs with improved structure
  - Streamlined grpc_streaming.exs
  - Improved grpc_concurrent.exs with better patterns

- **Test Coverage**
  - Increased total test coverage from 27 to 51 tests (+89%)
  - 37 Elixir tests passing (27 + 9 new integration tests + 1 new helper test)
  - 15 Python SessionContext tests passing
  - Enhanced test helpers with improved synchronization and cleanup

- **Build Configuration**
  - Enhanced mix.exs with expanded documentation and package metadata
  - Updated dependencies and build configurations

### Removed
- **DSPy Integration** (as announced in v0.4.3)
  - Removed deprecated `dspy_integration.py` module
  - Removed deprecated `types.py` with VariableType enum
  - Removed `session_context.py.backup`
  - Removed obsolete `test_server.py`
  - Removed unused CLI directory referencing non-existent modules
  - All `__pycache__/` directories cleaned up

- **Variables Feature (Temporary Removal)**
  - Removed incomplete variables implementation pending future redesign:
    - `lib/snakepit/bridge/variables.ex`
    - `lib/snakepit/bridge/variables/variable.ex`
    - `lib/snakepit/bridge/variables/types.ex`
    - All variable type modules (boolean, choice, embedding, float, integer, module, string, tensor)
    - `examples/grpc_variables.exs`
    - `lib/snakepit_showcase/demos/variables_demo.ex`
    - Related test files and Python code

- **Deprecated Components**
  - Removed `lib/snakepit/bridge/serialization.ex`
  - Removed `lib/snakepit/grpc/stream_handler.ex`
  - Removed integration test infrastructure (`test/integration/` directory)
  - Removed property-based tests pending refactor
  - Removed session and serialization tests pending redesign

### Fixed
- **Process Cleanup & Lifecycle**
  - Fixed race conditions in worker cleanup and termination
  - Improved OS-level process cleanup with proper signal handling
  - Enhanced DETS cleanup with run_id-based identification
  - Fixed test flakiness with improved synchronization

- **gRPC & Session Management**
  - Improved session initialization and cleanup in Python gRPC server
  - Enhanced error handling in bidirectional tool bridge
  - Better isolation between test runs

- **Test Infrastructure**
  - Isolation level configuration documented (staying with :basic until test refactoring)
  - Test infrastructure conflicts between manual cleanup and Supertester automatic cleanup resolved
  - Enhanced debugging capabilities for test failures

### Notes
- **Breaking Changes**:
  - DSPy integration fully removed (deprecated in v0.4.3)
  - Variables feature temporarily removed pending redesign
  - Users must migrate to DSPex for DSPy functionality (see v0.4.3 migration guide)
- Test suite reliability improved with better synchronization patterns
- Foundation laid for full Supertester conformance in future releases
- Process management significantly improved with run_id tracking system
- Documentation reorganized with archive structure for historical content

---

## [0.4.3] - 2025-10-07

### Deprecated
- **DSPy Integration** (`snakepit_bridge.dspy_integration`)
  - Deprecated in favor of DSPex-native integration
  - Will be removed in v0.5.0
  - Deprecation warnings added to all DSPy-specific classes:
    - `VariableAwarePredict`
    - `VariableAwareChainOfThought`
    - `VariableAwareReAct`
    - `VariableAwareProgramOfThought`
    - `ModuleVariableResolver`
    - `create_variable_aware_program()`
  - See migration guide: https://github.com/nshkrdotcom/dspex/blob/main/docs/architecture_review_20251007/04_DECOUPLING_PLAN.md

### Changed
- **VariableAwareMixin** docstring updated to emphasize generic applicability
  - Clarified it's generic, not DSPy-specific
  - Can be used with any Python library (scikit-learn, PyTorch, Pandas, etc.)

### Documentation
- Added prominent deprecation notice to README
- Added migration guide for DSPex users
- Clarified architectural boundaries (Snakepit = infrastructure, DSPex = domain)
- Added comprehensive architecture review documents

### Notes
- **No breaking changes** - existing code continues to work with deprecation warnings
- Core Snakepit functionality unaffected
- Non-DSPy users unaffected
- Deprecation period: 3-6 months before removal in v0.5.0

---

## [0.4.2] - 2025-10-07

### Fixed
- **DETS accumulation bug** - Fixed ProcessRegistry indefinite growth (1994+ stale entries cleaned up)
- **Session creation race condition** - Implemented atomic session creation with `:ets.insert_new` to eliminate concurrent initialization errors
- **Resource cleanup race condition** - Fixed `wait_for_worker_cleanup` to check actual resources (port availability + registry cleanup) instead of dead Elixir PID
- **Test cleanup race condition** - Added proper error handling in test teardown for already-stopped workers
- **ExDoc warnings** - Fixed documentation references by moving INSTALLATION.md to guides/ and adding to ExDoc extras

### Changed
- **ApplicationCleanup simplified** - Simplified implementation, changed to emergency-only handler with telemetry
- **Worker.Starter documentation** - Added comprehensive moduledoc with ADR-001 link explaining external process management rationale
- **DETS cleanup optimization** - Changed from O(n) per-PID syscalls to O(1) beam_run_id-based cleanup
- **Process.alive? filter removed** - Eliminated redundant check (Supervisor.which_children already returns alive children only)

### Added
- **ADR-001** - Architecture Decision Record documenting Worker.Starter supervision pattern rationale
- **External Process Supervision Design** - Comprehensive 1074-line design document covering multi-mode architecture
- **Issue #2 critical review** - Detailed analysis addressing all community feedback concerns
- **Performance benchmarks** - Added baseline benchmarks showing 1400-1500 ops/sec sustained throughput
- **Telemetry in ApplicationCleanup** - Added events for tracking orphan detection and emergency cleanup

### Removed
- **Dead code cleanup** - Removed unused/aspirational code:
  - Snakepit.Python module (referenced non-existent adapter)
  - GRPCBridge adapter (never used)
  - Dead Python adapters (dspy_streaming.py, enhanced.py, grpc_streaming.py)
  - Redundant helper functions in ApplicationCleanup
  - Catch-all rescue clauses (follows "let it crash" philosophy)

### Performance
- 100 workers initialize in ~3 seconds (unchanged)
- 1400-1500 operations/second sustained (maintained)
- DETS cleanup now O(1) vs O(n) (significant improvement for large process counts)

### Documentation
- Complete installation guide with platform-specific instructions (Ubuntu, macOS, WSL, Docker)
- Marked working vs WIP examples clearly (3 working, 6 aspirational)
- Added comprehensive analysis documents (150KB total)

### Testing
- All 139/139 tests passing ✅
- No orphaned processes ✅
- Clean shutdown behavior validated ✅

## [0.4.1] - 2025-07-24

### Added
- **New `process_text` tool** - Text processing capabilities with upper, lower, reverse, and length operations
- **New `get_stats` tool** - Real-time adapter and system monitoring with memory usage, CPU usage, and system information
- **Enhanced ShowcaseAdapter** - Added missing tools (adapter_info, echo, process_text, get_stats) for complete tool bridge demonstration

### Fixed
- **gRPC tool registration issues** - Resolved async/sync mismatch causing UnaryUnaryCall objects to be returned instead of actual responses
- **Missing tool errors** - Fixed "Unknown tool: adapter_info" and "Unknown tool: echo" errors by implementing missing @tool decorated methods
- **Automatic session initialization** - Fixed "Failed to register tools: not_found" error by automatically creating sessions before tool registration
- **Remote tool dispatch** - Implemented complete bidirectional tool execution between Elixir BridgeServer and Python workers
- **Async/sync compatibility** - Added proper handling for both sync and async gRPC stubs with fallback logic for UnaryUnaryCall objects

### Changed
- **BridgeServer enhancement** - Added remote tool execution capabilities with worker port lookup and gRPC forwarding
- **Python gRPC server** - Enhanced with automatic session initialization before tool registration
- **ShowcaseAdapter refactoring** - Expanded tool set to demonstrate full bidirectional tool bridge capabilities

## [0.4.0] - 2025-07-23

### Added
- Complete gRPC bridge implementation with full bidirectional tool execution
- Tool bridge streaming support for efficient real-time communication
- Variables feature with type system (string, integer, float, boolean, choice, tensor, embedding)
- Comprehensive process management and cleanup system
- Process registry with enhanced tracking and orphan detection
- SessionStore with TTL support and automatic expiration
- BridgeServer implementation for gRPC protocol
- StreamHandler for managing gRPC streaming responses
- Telemetry module for comprehensive metrics and monitoring
- MockGRPCWorker and test infrastructure improvements
- Showcase application with multiple demo scenarios
- Binary serialization support for large data (>10KB) with 5-10x performance improvement
- Automatic binary encoding with threshold detection
- Protobuf schema updates with binary fields support
- Tool registration and discovery system
- Elixir tool exposure to Python workers
- Batch variable operations for performance
- Variable watching/reactive updates support
- Heartbeat mechanism for session health monitoring

### Changed
- Major refactoring from legacy bridge system to gRPC-only architecture
- Removed all legacy bridge implementations (V1, V2, MessagePack)
- Unified all adapters to use gRPC protocol exclusively
- Worker module completely rewritten for gRPC support
- Pool module enhanced with configurable adapter support
- ProcessRegistry rewritten with improved tracking and cleanup
- Test framework upgraded with SuperTester integration
- Examples reorganized and updated for gRPC usage
- Python client library restructured as snakepit_bridge package
- Serialization module now returns 3-tuple `{:ok, any_map, binary_data}`
- Large tensors and embeddings automatically use binary encoding
- Integration tests updated to use new infrastructure

### Fixed
- Process cleanup and orphan detection issues
- Worker termination and registry cleanup
- Module redefinition warnings in test environment
- SessionStore TTL validation and expiration timing
- Mock adapter message handling
- Integration test pool timeouts and shutdown
- GitHub Actions deprecation warnings
- Elixir version compatibility in integration tests

### Removed
- All legacy bridge implementations (generic_python.ex, generic_python_v2.ex, etc.)
- MessagePack protocol support (moved to gRPC exclusively)
- Old Python bridge scripts (generic_bridge.py, enhanced_bridge.py)
- Legacy session_context.py implementation
- V1/V2 adapter pattern in favor of unified gRPC approach

## [0.3.3] - 2025-07-20

### Added
- Support for custom adapter arguments in gRPC adapter via pool configuration
- Enhanced Python API commands (call, store, retrieve, list_stored, delete_stored) in gRPC adapter
- Dynamic command validation based on adapter type in gRPC adapter

### Changed
- GRPCPython adapter now accepts custom adapter arguments through pool_config.adapter_args
- Improved supported_commands/0 to dynamically include commands based on the adapter in use

### Fixed
- gRPC adapter now properly supports third-party Python adapters like DSPy integration

## [0.3.2] - 2025-07-20

### Fixed
- Added missing files to the repository

## [0.3.1] - 2025-07-20

### Changed
- Merged MessagePack optimizations into main codebase
- Unified documentation for gRPC and MessagePack features
- Set GenericPythonV2 as default adapter with auto-negotiation

## [0.3.0] - 2025-07-20

### Added
- Complete gRPC bridge implementation with streaming support
- MessagePack serialization protocol support
- Comprehensive gRPC integration documentation and setup guides
- Enhanced bridge documentation and examples

### Changed
- Deprecated V1 Python bridge in favor of V2 architecture
- Updated demo implementations to use V2 Python bridge
- Improved gRPC streaming bridge implementation
- Enhanced debugging capabilities and cleanup

### Fixed
- Resolved init/1 blocking issues in V2 Python bridge
- General debugging improvements and code cleanup

## [0.2.1] - 2025-07-20

### Fixed
- Eliminated "unexpected message" logs in Pool module by properly handling Task completion messages from `Task.Supervisor.async_nolink`

## [0.2.0] - 2025-07-19

### Added
- Complete Enhanced Python Bridge V2 Extension implementation
- Built-in type support for Python Bridge V2
- Test rework specifications and improved testing infrastructure
- Commercial refactoring recommendations documentation

### Changed
- Enhanced Python Bridge V2 with improved architecture and session management
- Improved debugging capabilities for V2 examples
- Better error handling and robustness in Python Bridge

### Fixed
- Bug fixes in Enhanced Python Bridge examples
- Data science example debugging improvements
- General cleanup and code improvements

## [0.1.2] - 2025-07-18

### Added
- Python Bridge V2 with improved architecture and session management
- Generalized Python bridge implementation
- Enhanced session management capabilities

### Changed
- Major architectural improvements to Python bridge
- Better integration with external Python processes

## [0.1.1] - 2025-07-18

### Added
- DIAGS.md with comprehensive Mermaid architecture diagrams
- Elixir-themed styling and proper subgraph format for diagrams
- Logo support to ExDoc and hex package
- Mermaid diagram support in documentation

### Changed
- Updated configuration to include assets and documentation
- Improved documentation structure and visual presentation

### Fixed
- README logo path for hex docs
- Asset organization (moved img/ to assets/)

## [0.1.0] - 2025-07-18

### Added
- Initial release of Snakepit
- High-performance pooling system for external processes
- Session-based execution with worker affinity
- Built-in adapters for Python and JavaScript/Node.js
- Comprehensive session management with ETS storage
- Telemetry and monitoring support
- Graceful shutdown and process cleanup
- Extensive documentation and examples

### Features
- Lightning-fast concurrent worker initialization (1000x faster than sequential)
- Session affinity for stateful operations
- Built on OTP primitives (DynamicSupervisor, Registry, GenServer)
- Adapter pattern for any external language/runtime
- Production-ready with health checks and error handling
- Configurable pool sizes and timeouts
- Built-in bridge scripts for Python and JavaScript

[0.6.8]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.6.8
[0.6.7]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.6.7
[0.5.1]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.5.1
[0.5.0]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.5.0
[0.4.3]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.4.3
[0.4.2]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.4.2
[0.4.1]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.4.1
[0.4.0]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.4.0
[0.3.3]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.3.3
[0.3.2]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.3.2
[0.3.1]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.3.1
[0.3.0]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.3.0
[0.2.1]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.2.1
[0.2.0]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.2.0
[0.1.2]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.1.2
[0.1.1]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.1.1
[0.1.0]: https://github.com/nshkrdotcom/snakepit/releases/tag/v0.1.0