# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [0.8.1] - 2025-06-17
### Added
- **Comprehensive API Documentation** - Complete public API reference
- `docs/API_REFERENCE.md` - Full public API documentation with examples
- `guides/internal_modules.md` - Internal modules guide with migration examples
- Enhanced ExDoc configuration with organized guide sections
- Clear separation between public API and internal implementation
### Fixed
- All compilation warnings resolved
- Replace deprecated `Logger.warn` with `Logger.warning`
- Fix unreachable error clauses in HTTP client and metrics modules
- Add conditional compilation for optional dependencies (Prometheus, StatsD)
- Remove unused streaming functions and helper methods
- Fix module attribute ordering issues
- Add embeddings function stubs to all provider implementations
- Fix nil module reference warnings using `apply/3`
### Enhanced
- **Developer Experience** - Better documentation structure and API clarity
- **Code Quality** - Clean compilation with zero warnings
- **Documentation Organization** - Logical grouping of guides and references
## [0.8.0] - 2025-06-16
### Added
- **Advanced Streaming Infrastructure** - Production-ready streaming enhancements
- `StreamBuffer` - Memory-efficient circular buffer with overflow protection
- `FlowController` - Advanced flow control with backpressure handling
- `ChunkBatcher` - Intelligent chunk batching for optimized I/O
- Configurable consumer types: `:direct`, `:buffered`, `:managed`
- Comprehensive streaming metrics and monitoring
- Adaptive batching based on chunk characteristics
- Graceful degradation for slow consumers
- **Comprehensive Telemetry System** - Complete observability and instrumentation
- Telemetry events for all major operations (chat, streaming, cache, session, context)
- Optional telemetry_metrics and OpenTelemetry integration
- Context and session management instrumentation
- Cache operation tracking with hit/miss/put events
- Cost calculation and threshold monitoring
- Default logging handlers with configurable levels
### Enhanced
- **Streaming Performance** - Reduced system calls through intelligent batching
- **Memory Safety** - Fixed-size buffers prevent unbounded memory growth
- **User Experience** - Smooth output even with fast providers (Groq, Claude)
### Fixed
- **Test Infrastructure** - Comprehensive test tagging and organization improvements
- Fixed 11 failing unit tests by properly categorizing integration vs unit tests
- Improved test tagging strategy with `:unit`, `:integration`, `:model_loading`, `:requires_service` tags
- Fixed MockConfigProvider implementation in Gemini tokens tests
- Separated unit tests from integration tests requiring external dependencies
## [0.7.1] - 2025-06-14
### Added
- **Comprehensive Documentation System** - Complete ExDoc configuration with organized structure
- 24 Mix test aliases for targeted testing (provider, capability, and type-based)
- Organized documentation into logical groups: Guides, References
- Complete test documentation covering semantic tagging and caching system
### Changed
- Updated ExDoc configuration to include all public documentation files
- Streamlined documentation structure by removing internal development docs
- Enhanced README with current feature set and improved examples
### Fixed
- Resolved all ExDoc file reference warnings
- Fixed documentation generation for publication-ready docs
## [0.7.0] - 2025-06-14
### Added
- **Advanced Test Response Caching System** - Complete caching infrastructure for integration tests
- Intelligent cache storage with JSON-based persistence
- TTL-based cache expiration and cleanup
- Request/response matching with fuzzy algorithms
- Cache statistics and performance monitoring
- Automatic cache key generation and indexing
- Smart fallback strategies for cache misses
- Configurable cache organization (by provider, test module, or tag)
- Environment-based cache configuration
- Mix task for cache management: `mix ex_llm.cache`
### Enhanced
- **Test Caching Performance** - 25x speed improvement for integration tests
- **Cache Detection** - Automatic detection of destructive operations
- **Response Interception** - Transparent request/response caching for HTTP calls
- **Metadata Tracking** - Comprehensive test context and response metadata
## [0.6.0] - 2025-06-14
### Added
- **Comprehensive Test Tagging System** - Replaced all 138 generic `@tag :skip` with meaningful semantic tags
- `:live_api` - Tests that call live provider APIs
- `:requires_api_key` - Tests needing API keys with provider-specific checking
- `:requires_oauth` - Tests needing OAuth2 authentication
- `:requires_service` - Tests needing local services (Ollama, LM Studio)
- `:requires_resource` - Tests needing pre-existing resources (tuned models, corpora)
- `:integration` - Integration tests with external services
- `:external` - Tests making external network calls
- Provider-specific tags: `:anthropic`, `:openai`, `:gemini`, etc.
- **Enhanced Test Caching System** - Intelligent caching based on test tags
- Uses `:live_api` tag to determine which tests to cache
- Automatic detection of destructive operations (create, delete, modify)
- Smart cache exclusion for corpus deletion and state-changing tests
- 25x speed improvement for cached integration tests (2.2s → 0.09s)
- **Mix Test Aliases** - 24 new test aliases for targeted testing
- Provider-specific: `mix test.anthropic`, `mix test.openai`, etc.
- Tag-based: `mix test.integration`, `mix test.oauth2`, `mix test.live_api`
- Capability-based: `mix test.streaming`, `mix test.vision`
- **ExLLM.Case Test Module** - Custom test case with automatic requirement checking
- Dynamic skipping with meaningful messages when requirements aren't met
- API key validation per provider
- OAuth2 token validation
- Service availability checking
### Changed
- **BREAKING:** Migrated from generic `:skip` tags to semantic tagging system
- Enhanced OAuth2 test helper to use consistent `:requires_oauth` tag
- Improved test cache detection to prevent caching destructive operations
- Updated all provider integration tests with proper module-level tags
### Fixed
- Fixed undefined variable `service` in ExLLM.Case rescue clause
- Fixed OpenRouter test compilation error with undefined function
- Fixed OAuth2 tag inconsistency (now uses `:requires_oauth` everywhere)
- Fixed test cache configuration for destructive operation detection
## [0.5.0] - 2025-06-13
### Added
- **Complete Google Gemini API Implementation** - All 15 Gemini APIs now fully implemented
- **Live API**: Real-time bidirectional communication with WebSocket support
- Text, audio, and video streaming capabilities
- Tool/function calling in live sessions
- Session resumption and context compression
- Activity detection and management
- Audio transcription for input/output
- **Models API**: List and get model information
- **Content Generation API**: Chat and streaming with multimodal support
- **Token Counting API**: Count tokens for any content
- **Files API**: Upload and manage media files
- **Context Caching API**: Cache content for reuse across requests
- **Embeddings API**: Generate text embeddings
- **Fine-tuning API**: Create and manage custom tuned models
- **Permissions API**: Manage access to tuned models and corpora
- **Question Answering API**: Semantic search and QA
- **Corpus Management API**: Create and manage knowledge corpora
- **Document Management API**: Manage documents within corpora
- **Chunk Management API**: Fine-grained document chunk management
- **Retrieval Permissions API**: Control access to retrieval resources
- **Gun WebSocket Library**: Added Gun dependency for Live API WebSocket support
- **OAuth2 Authentication**: Full OAuth2 support for Gemini APIs requiring user auth
- **Comprehensive Test Suite**: 477 tests covering all Gemini functionality
### Changed
- Updated Gemini adapter to use new modular API implementation
- Enhanced authentication to support both API keys and OAuth2 tokens
- Improved error handling with Gemini-specific error messages
- Updated documentation with complete Gemini API coverage
### Fixed
- Fixed unused variable warnings in Gemini auth module
- Fixed Live API compilation errors with proper string escaping
- Fixed content parsing to handle JSON response formats correctly
## [0.4.2] - 2025-06-08
### Changed
- **BREAKING:** Renamed `:local` provider atom to `:bumblebee` for clarity
- All references to `:local` in code and documentation have been updated
- Update any code using `ExLLM.chat(:local, ...)` to `ExLLM.chat(:bumblebee, ...)`
- Changed default Bumblebee model from `microsoft/phi-2` to `Qwen/Qwen3-0.6B`
- Excluded `emlx` dependency from Hex package until it's published
- Updated README with instructions for adding `emlx` manually for Apple Silicon support
- Updated documentation to clarify that `instructor`, `bumblebee`, and `nx` are required dependencies
- Clarified that `exla` and `emlx` are optional hardware acceleration backends
### Fixed
- Mock adapter now properly checks for `mock_error` option in chat function
## [0.4.1] - 2025-06-08
### Added
- **Response Caching System** - Cache real provider responses for offline testing and development
- **Automatic Response Collection**: All provider responses automatically cached when enabled
- **Mock Integration**: Configure Mock adapter to replay cached responses from any provider
- **Cache Management**: Full CRUD operations for cached responses with provider organization
- **Fuzzy Matching**: Robust request matching handles real-world usage variations
- **Environment Configuration**: Simple enable/disable via `EX_LLM_CACHE_RESPONSES` environment variable
- **Cost Reduction**: Reduce API costs during development by replaying cached responses
- **Realistic Testing**: Use authentic provider responses in tests without API calls
- **Streaming Support**: Cache and replay streaming responses with exact chunk reproduction
- **Cross-Provider Testing**: Test application compatibility across different provider response formats
### Changed
- Enhanced shared response builder to support more response formats (completion, image, audio, moderation)
- Extended HTTP client with provider-specific headers for 15+ providers
- Improved error handling with normalization and retry logic for multiple providers
### Fixed
- Fixed pre-push hook to exclude integration tests preventing timeouts
- Fixed unsafe String.to_atom usage throughout codebase (Sobelow warnings)
- Fixed length() > 0 warnings by using pattern matching
- Fixed typing warnings for potentially nil values
- Fixed ModelConfig runtime path resolution for test environment
- Fixed ResponseCache JSON key atomization for proper cache loading
- Fixed capability normalization to handle already-normalized capability names
- Added missing model capabilities (vision for Claude-3-Opus, reasoning for XAI models)
## [0.4.0] - 2025-06-06
### Added
- **Complete OpenAI API Implementation** - Full support for modern OpenAI API features
- **Audio Features**: Support for audio input in messages and audio output configuration
- **Web Search Integration**: Support for web search options in chat completions
- **O-Series Model Features**: Reasoning effort parameter and developer role support
- **Predicted Outputs**: Support for faster regeneration with prediction hints
- **Additional APIs**: Six new OpenAI API endpoints
- `moderate_content/2` - Content moderation using OpenAI's moderation API
- `generate_image/2` - DALL-E image generation with configurable parameters
- `transcribe_audio/2` - Whisper audio transcription (basic implementation)
- `upload_file/3` - File upload for assistants and other endpoints (basic implementation)
- `create_assistant/2` - Create assistants with custom instructions and tools
- `create_batch/2` - Batch processing for multiple requests
- **Enhanced Message Support**: Multiple content parts per message (text + audio/image)
- **Modern Request Parameters**: Support for all modern OpenAI API parameters
- `max_completion_tokens`, `top_p`, `frequency_penalty`, `presence_penalty`
- `seed`, `stop`, `service_tier`, `logprobs`, `top_logprobs`
- **JSON Response Formats**: JSON mode and JSON Schema structured outputs
- **Modern Tools API**: Full support for tools API replacing deprecated functions
- **Enhanced Streaming**: Tool calls and usage information in streaming responses
- **Enhanced Usage Tracking**: Detailed token usage with cached/reasoning/audio tokens
### Changed
- **MessageFormatter**: Added support for "developer" role for O1+ models
- **OpenAI Adapter**: Comprehensive test coverage with 46 tests following TDD methodology
- **Response Types**: Enhanced LLMResponse struct with new fields (refusal, logprobs, tool_calls)
### Technical
- Implemented using Test-Driven Development (TDD) methodology
- Maintains full backward compatibility with existing API
- All features validated with comprehensive test suite
- Proper error handling and API key validation for all new endpoints
- **Ollama Configuration Management** - Generate and update local model configurations
- New `generate_config/1` function to create YAML config for all installed models
- New `update_model_config/2` function to update specific model configurations
- Automatic capability detection using `/api/show` endpoint
- Real context window sizes from model metadata
- Preserves existing configuration when merging
- Example: `ExLLM.Adapters.Ollama.generate_config(save: true)`
## [0.3.2] - 2025-06-06
### Added
- **Capability Normalization** - Automatic normalization of provider-specific capability names
- New `ExLLM.Capabilities` module providing unified capability interface
- Normalizes different provider terminologies (e.g., `tool_use` → `function_calling`)
- Works transparently with all capability query functions
- Comprehensive mappings for common capability variations
- Example: `find_providers_with_features([:tool_use])` works across all providers
- Enhanced provider capability tracking with real-time API discovery
- New `fetch_provider_capabilities.py` script for API-based capability detection
- Updated `fetch_provider_models.py` with better context window detection
- Fixed incorrect context windows (e.g., GPT-4o now correctly shows 128,000)
- Automatic capability detection from model IDs
- New capability normalization demo in example app (option 6 in Provider Capabilities Explorer)
- **Comprehensive Documentation**
- New Quick Start Guide (`docs/QUICKSTART.md`) - Get up and running in 5 minutes
- New User Guide (`docs/USER_GUIDE.md`) - Complete documentation of all features
- Reorganized documentation into `docs/` directory
- Added prominent documentation links to README
### Changed
- Updated `provider_supports?/2`, `model_supports?/3`, `find_providers_with_features/1`, and `find_models_with_features/1` to use normalized capabilities
### Fixed
- Mock provider now properly supports Instructor integration for structured outputs
- Cost formatting now consistently uses dollars with appropriate decimal places (e.g., "$0.000324" instead of "$0.032¢")
- Anthropic provider now includes required `max_tokens` parameter when using Instructor
- Mock provider now generates semantically meaningful embeddings for realistic similarity search
- Fixed KeyError when using providers without pricing data (e.g., Ollama)
- Cost tracking now properly adds cost information to chat responses
- Ollama now properly supports function calling for compatible models
- Made request timeouts configurable via `:timeout` option (defaults: Ollama 2min, others use client defaults)
- Fixed MatchError in example app when displaying providers without capabilities info
- Provider and model capability queries now accept any provider's terminology
- Moved `LOGGER.md`, `PROVIDER_CAPABILITIES.md`, and `DROPPED.md` to `docs/` directory
- Enhanced provider capabilities with data from API discovery scripts
## [0.3.1] - 2025-06-05
### Added
- **Major Code Refactoring** - Reduced code duplication by ~40% through shared modules:
- `StreamingCoordinator` - Unified streaming implementation for all adapters
- Standardized SSE parsing and buffering
- Provider-agnostic chunk handling
- Integrated error recovery support
- Simplified adapter streaming implementations
- `RequestBuilder` - Common request construction patterns
- Unified parameter handling across providers
- Provider-specific transformations via callbacks
- Support for chat, embeddings, and completion endpoints
- `ModelFetcher` - Standardized model discovery behavior
- Common API fetching patterns
- Unified filter/parse/transform pipeline
- Integration with ModelLoader for caching
- `VisionFormatter` - Centralized vision/multimodal content handling
- Provider-specific image formatting (Anthropic, OpenAI, Gemini)
- Media type detection from file extensions and magic bytes
- Base64 encoding/decoding utilities
- Image size validation
- Unified `ExLLM.Logger` module replacing multiple logging approaches
- Single consistent API for all logging needs
- Simple Logger-like interface: `Logger.info("message")`
- Automatic context tracking with `with_context/2`
- Structured logging for LLM-specific events (requests, retries, streaming)
- Configurable log levels and component filtering
- Security features: API key and content redaction
- Performance tracking with automatic duration measurement
### Changed
- **BREAKING:** Replaced all `Logger` and `DebugLogger` usage with unified `ExLLM.Logger`
- All modules now use `alias ExLLM.Logger` instead of `require Logger`
- Consistent logging interface across the entire codebase
- Simplified developer experience with one logging API
- Enhanced `HTTPClient` with unified streaming support via `post_stream/3`
- Improved error handling consistency across all shared modules
- Better separation of concerns in adapter implementations
### Technical Improvements
- Reduced code duplication significantly across adapters
- More maintainable and testable codebase structure
- Easier to add new providers using shared behaviors
- Consistent patterns for common operations
## [0.3.0] - 2025-06-05
### Added
- X.AI adapter implementation with complete feature support
- Full OpenAI-compatible API integration
- Support for all Grok models (Beta, 2, 3, Vision variants)
- Streaming, function calling, vision, and structured outputs
- Web search and reasoning capabilities
- Complete Instructor integration for structured outputs
- Synced model metadata from LiteLLM (1053 models across 56 providers)
- New OpenAI models: GPT-4.1 series (gpt-4.1, gpt-4.1-mini, gpt-4.1-nano)
- New OpenAI O1 reasoning models (o1-pro, o1, o1-mini, o1-preview)
- New XAI Grok-3 models (grok-3, grok-3-beta, grok-3-fast, grok-3-mini variants)
- New model capabilities: structured_output, prompt_caching, reasoning, web_search
- Updated pricing and context windows for all models
- Fetched latest models from provider APIs (606 models from 6 providers)
- New Anthropic models: Claude 4 Opus/Sonnet/Haiku with 32K output tokens
- New Groq models: DeepSeek R1 distilled models, QwQ-32B, Mistral Saba
- New Gemini models: Gemini 2.5 Pro/Flash, Gemini 2.0 Flash with multimodal support
- New OpenAI models: O3/O4 series, GPT-4.5 preview, search-enabled models
- Updated context windows and capabilities from live APIs
- Groq support for structured outputs via Instructor integration
### Changed
- Updated default models:
- OpenAI: Set to gpt-4.1-nano
- Anthropic: Set to claude-3-5-sonnet-latest
- Enhanced Instructor module to support Groq provider
- Updated example app to include Groq in structured output demos
- Updated README.md with current model information:
- Anthropic: Added Claude 4 series and Claude 3.7
- OpenAI: Added GPT-4.1 series and O1 reasoning models
- Gemini: Added Gemini 2.5 and 2.0 series
- Groq: Added Llama 4 Scout, DeepSeek R1 Distill, and QwQ-32B
- Task reorganization:
- Created docs/DROPPED.md for features that don't align with core library mission
- Reorganized TASKS.md with clearer priorities and focused roadmap
- Added refactoring tasks to reduce code duplication by ~40%
### Fixed
- Instructor integration now correctly separates params and config for chat_completion
- Advanced features demo uses correct Mock adapter method (set_stream_chunks)
- Module reference errors in Context management demo
## [0.2.1] - 2025-06-05
### Added
- Provider Capability Discovery System
- New `ExLLM.ProviderCapabilities` module for tracking API-level provider capabilities
- Provider feature discovery independent of specific models
- Authentication method tracking (API key, OAuth, AWS signature, etc.)
- Provider endpoint discovery (chat, embeddings, images, audio, etc.)
- Provider recommendations based on required/preferred features
- Provider comparison tools for feature analysis
- Integrated provider capability functions into main ExLLM module
- Added provider capability explorer to example app demo
- Environment variable wrapper script (`scripts/run_with_env.sh`) for Claude CLI usage
- Groq models API support (https://api.groq.com/openai/v1/models)
- Dynamic model loading from provider APIs
- All adapters now fetch models dynamically from provider APIs when available
- Automatic fallback to YAML configuration when API is unavailable
- Created `ExLLM.ModelLoader` module for centralized model loading with caching
- Anthropic adapter now uses `/v1/models` API endpoint
- OpenAI adapter fetches from `/v1/models` and filters chat models
- Gemini adapter uses Google's models API
- Ollama adapter fetches from local server's `/api/tags`
- OpenRouter adapter uses public `/api/v1/models` API
- OpenRouter adapter with access to 300+ models from multiple providers
- Support for Claude, GPT-4, Llama, PaLM, and many other model families
- Unified API interface for different model architectures
- Automatic model discovery and cost-effective access to premium models
- External YAML configuration system for model metadata
- Model pricing, context windows, and capabilities stored in `config/models/*.yml`
- Runtime configuration loading with ETS caching for performance
- Separation of model data from code for easier maintenance
- Support for easy updates without code changes
- OpenAI-Compatible base adapter for shared implementation
- Reduces code duplication across providers with OpenAI-compatible APIs
- Groq adapter as first implementation using the base adapter
- Model configuration sync script from LiteLLM
- Python script to sync model data from LiteLLM's database
- Added 1048 models with pricing, context windows, and capabilities
- Automatic conversion from LiteLLM's JSON to ExLLM's YAML format
- Extracted ALL provider configurations from LiteLLM
- Created YAML files for 56 unique providers (49 new providers)
- Includes Azure, Mistral, Perplexity, Together AI, Databricks, and more
- Ready-to-use configurations for future adapter implementations
### Changed
- **BREAKING:** Model configuration moved from hardcoded maps to external YAML files
- All providers now use `ExLLM.ModelConfig` for pricing and context window data
- Default models, pricing, and context windows loaded from YAML configuration
- Added `yaml_elixir` dependency for YAML parsing
- Updated Bedrock adapter with comprehensive model support:
- Added all latest Anthropic models (Claude 4, 3.7, 3.5 series)
- Added Amazon Nova models (Micro, Lite, Pro, Premier)
- Added AI21 Labs Jamba series (1.5-large, 1.5-mini, instruct)
- Added Cohere Command R series (R, R+)
- Added DeepSeek R1 model
- Added Meta Llama 4 and 3.x series models
- Added Mistral Pixtral Large 2025-02
- Added Writer Palmyra X4 and X5 models
- Changed default model from "claude-3-sonnet" to "nova-lite" for cost efficiency
- Updated pricing data for all Bedrock providers with per-1M token rates
- Updated context window sizes for all new Bedrock models
- Enhanced streaming support for all new providers (Writer, DeepSeek)
- All adapters now use ModelConfig for consistent default model retrieval
### Changed
- **BREAKING:** Refactored `ExLLM.Adapters.OpenAICompatible` base adapter
- Extracted common helper functions (`format_model_name/1`, `default_model_transformer/2`) as public module functions
- Simplified adapter implementations by removing duplicate code
- Added ModelLoader integration to base adapter for consistent dynamic model loading
- Added `filter_model/1` and `parse_model/1` callbacks for customizing model parsing
### Fixed
- Anthropic models API fetch now correctly parses response structure (uses `data` field instead of `models`)
- Python model fetch script updated to handle Anthropic's API response format
- OpenRouter pricing parser now handles string values correctly
- Groq adapter compilation warnings for undefined callbacks
- DateTime serialization in MessageFormatter for session persistence
- OpenAI adapter streaming termination handling
- JSON double-encoding issue in HTTPClient
- Token field name standardization across adapters (input_tokens/output_tokens)
- Instructor integration API parameter passing
- Context management module reference errors in example app
- Function calling demo error handling with string keys
- Streaming chat demo now shows token usage and cost estimates
### Changed
- Made Instructor a required dependency instead of optional
- OpenAI default model changed to gpt-4.1-nano
- Instructor now uses dynamic default models from YAML configs
- Example app no longer hardcodes model names
### Improved
- Code organization with shared modules to eliminate duplication:
- Created `ExLLM.Adapters.Shared.Validation` for API key validation
- All adapters now use `ModelUtils.format_model_name` for consistent formatting
- All adapters now use `ConfigHelper.ensure_default_model` for default models
- Test files updated to use `TestHelpers` consistently
- Example app enhancements:
- Session management shows full conversation history
- Function calling demo clearly shows available tools
- Advanced features demo now has real implementations
- Cost formatting uses decimal notation instead of scientific
### Removed
- Removed hardcoded model names from adapters
- Removed `model_capabilities.ex.bak` backup file
- Removed `DUPLICATE_CODE_ANALYSIS.md` after completing all refactoring
## [0.2.0] - 2025-05-25
### Added
- OpenAI adapter with GPT-4 and GPT-3.5 support
- Ollama adapter for local model inference
- AWS Bedrock adapter with full multi-provider support (Anthropic, Amazon Titan, Meta Llama, Cohere, AI21, Mistral)
- Complete AWS credential chain support (environment vars, profiles, instance metadata, ECS task roles)
- Provider-specific request/response formatting
- Native streaming support
- Dynamic model listing via AWS Bedrock API
- Google Gemini adapter with Pro, Ultra, and Nano models
- Context management functionality to automatically handle LLM context windows
- `ExLLM.Context` module with the following features:
- Automatic message truncation to fit within model context windows
- Multiple truncation strategies (sliding_window, smart)
- Context window validation
- Token estimation and statistics
- Model-specific context window sizes
- Session management functionality for conversation state tracking
- `ExLLM.Session` module with the following features:
- Conversation state management
- Message history tracking
- Token usage tracking
- Session persistence (save/load)
- Export to markdown/JSON formats
- Local model support via Bumblebee integration
- `ExLLM.Adapters.Local` with the following features:
- Support for Phi-2, Llama 2, Mistral, GPT-Neo, and Flan-T5
- Hardware acceleration (Metal, CUDA, ROCm, CPU)
- Model lifecycle management with ModelLoader GenServer
- Zero-cost inference (no API fees)
- Privacy-preserving local execution
- New public API functions in main ExLLM module:
- Context management: `prepare_messages/2`, `validate_context/2`, `context_window_size/2`, `context_stats/1`
- Session management: `new_session/2`, `chat_with_session/2`, `save_session/2`, `load_session/1`, etc.
- Automatic context management in `chat/3` and `stream_chat/3`
- Optional dependencies (Bumblebee, Nx, EXLA) for local model support
- Application supervisor for managing ModelLoader lifecycle
- Comprehensive test coverage for all new features
### Changed
- Updated `chat/3` and `stream_chat/3` to automatically apply context truncation
- Enhanced documentation with context management and session examples
- ExLLM is now a comprehensive all-in-one solution including cost tracking, context management, and session handling
## [0.1.0] - 2025-05-24
### Added
- Initial release with unified LLM interface
- Support for Anthropic Claude models
- Streaming support via Server-Sent Events
- Integrated cost tracking and calculation
- Token estimation functionality
- Configurable provider system
- Comprehensive error handling