# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.1.0] - 2026-06-12
### BEAM-Native Streaming and Concurrent Zarr Processing
ExZarr v1.1.0 adds streaming APIs, pipeline integrations, and telemetry for
large-scale array processing on the BEAM.
### Added
#### Streaming APIs
- `ExZarr.Array.stream_chunks/2` - lazy chunk streaming with concurrency, metadata, and filtering
- `ExZarr.Array.stream_slices/3` - dimension-wise slice streaming
- `ExZarr.Array.write_stream/3` - chunk ingestion from enumerables with validation and checkpoints
- Shared streaming internals module (internal; not part of the public API)
- `chunk_stream/2` retained as backward-compatible alias for `stream_chunks/2`
#### Pipeline Integrations (Optional Dependencies)
- `ExZarr.Flow` with `chunk_flow/2` and `slice_flow/3`
- `ExZarr.GenStage` with `ChunkProducer` and `SliceProducer`
- `ExZarr.Broadway` with `ChunkProducer` and pipeline helpers
#### Observability
- `ExZarr.Telemetry` module with chunk read/write and stream start/stop events
#### Documentation
- `docs/architecture_review.md`, `docs/gap_analysis.md`, `docs/v1_1_design.md`
- `docs/cloud_storage_patterns.md`
- `docs/cookbook/` starter recipes for large-array workflows
- `livebooks/broadway_pipeline.livemd`, `livebooks/nx_streaming.livemd`
- `release_notes_v1_1_0.md`, `migration_guide_v1_1_0.md`
#### Benchmarks
- `benchmarks/streaming_bench.exs` for streaming throughput measurement
### Changed
- `chunk_stream/2` now delegates to `stream_chunks/2`
- `:parallel` option aliased to `:concurrency` in streaming APIs (logs deprecation warning)
- Maximum `:concurrency` increased from 10 (`chunk_stream/2` legacy cap) to 128
- Optional dependencies added: `flow`, `gen_stage`, `broadway`
- `zigler` constrained to `~> 0.16` (requires zig 0.16.0; run `mix zig.get` before compile)
## [1.0.0] - 2026-01-27
### First Stable Release!
ExZarr 1.0.0 marks the first production-ready release with comprehensive testing, security hardening, and extensive documentation.
### Added
#### Testing & Quality Assurance
- **Comprehensive Test Suite**: 1,713 total tests (146 doctests + 65 properties + 1,502 unit tests)
- Zero test failures across all test suites
- 4 tests intentionally skipped for environment-specific features
- 151 tests excluded (cloud storage backends requiring credentials)
- **Property-Based Testing**: Expanded from 21 to 65 properties
- New `test/ex_zarr_codecs_property_test.exs` with 19 codec-focused properties
- Enhanced `test/ex_zarr_property_test.exs` with 25 additional properties
- Comprehensive coverage of compression, indexing, storage, and metadata operations
- **Backend Test Coverage**: New comprehensive test files
- `test/ex_zarr/storage_comprehensive_test.exs` - 39 tests for storage operations
- `test/ex_zarr/storage/backend/filesystem_test.exs` - 41 tests (82% coverage)
- `test/ex_zarr/storage/backend/zip_test.exs` - 40 tests (95.2% coverage)
- `test/mix/tasks/fix_nif_rpaths_test.exs` - 14 tests for Mix task
- **Core Module Coverage**: Significantly improved test coverage
- `format_converter.ex`: 20% → 80% (+60 percentage points, 36 tests)
- `indexing.ex`: 12.1% → 85.1% (+73pp, 69 tests)
- `metadata.ex`: 59.1% → 79.5% (+20pp, 56 tests)
- `storage.ex`: ~29% → 68.1% (+39pp, 39 tests)
- `filesystem.ex`: 0% → 82% (+82pp, 41 tests)
- `zip.ex`: 66.6% → 95.2% (+29pp, 40 tests)
- **Overall Coverage**: 80.3% (up from 76.3%), with 100% coverage on 6 critical modules
#### Security & Documentation
- **Security Policy**: Comprehensive `SECURITY.md` with 550+ lines
- Vulnerability reporting process and timelines
- Input validation best practices with code examples
- Cloud authentication security patterns
- Path traversal prevention guidelines
- Resource limit recommendations
- Security checklist for production deployments
- **Sobelow Integration**: Static security analysis configured
- `.sobelow-conf` configuration file with documented exceptions
- All high/medium confidence warnings resolved
- 45 low-confidence warnings documented as expected behavior
- Detailed explanation of file traversal, String.to_atom, and configuration warnings
- **Enhanced Error Handling Guide**: `guides/error_handling.md`
- Comprehensive error handling patterns
- Recovery strategies for common failures
- Circuit breaker and retry patterns
- Logging and debugging recommendations
- **Telemetry Guide**: `guides/telemetry.md`
- Complete instrumentation documentation
- Integration examples for monitoring systems
- Performance metrics and event tracking
#### Code Quality
- **Zero Compilation Warnings**: Clean compilation across all environments
- **Credo Grade A+**: Strict mode with 0 issues (1,396 mods/funs analyzed)
- **Dialyzer Passing**: All type specs validated, 14 known issues properly suppressed
- **Documentation Coverage**:
- Zero `mix docs` warnings
- All public functions have `@doc` annotations
- All modules have `@moduledoc` annotations
- Comprehensive guides in `guides/` directory
### Changed
#### API Stability
- **Semantic Versioning Commitment**: v1.0.0 marks API stability
- No breaking changes planned for 1.x series
- Deprecation warnings will be added for any future API changes
- At least one minor version deprecation period before removal
- **Dependency Versions**: Updated to stable releases
- `:telemetry` added for observability support
- All dependencies pinned to stable versions
### Fixed
#### Test Stability
- Fixed metadata tests to include required `fill_value` field
- Fixed storage tests to handle mock backend registration idempotency
- Fixed property tests to only test public API functions
- Corrected filter metadata encoding to include both `dtype` and `astype` fields
#### Documentation
- Fixed all relative path references in guides
- Corrected README.md installation instructions
- Updated all version references to 1.0.0
### Testing Metrics
**Test Suite Performance**
- Total execution time: ~5.8 seconds
- Async tests: 3.4 seconds
- Sync tests: 2.4 seconds
- All tests: 100% passing rate
**Code Coverage by Module**
- 100% coverage: `ex_zarr.ex`, `application.ex`, `chunk_cache.ex`, `version.ex`, `storage/backend.ex`, `codecs/codec.ex`
- >90% coverage: `chunk_key.ex` (96%), `storage/backend/zip.ex` (95.2%), `codecs/sharding_indexed.ex` (93.9%), `array_server.ex` (93%), `memory.ex` (90.9%)
- >80% coverage: 15 additional modules
- Overall project: 80.3%
### Security
**Vulnerability Status**: No security vulnerabilities reported or discovered
**Security Hardening**
- Comprehensive path validation examples
- Safe usage patterns for all file operations
- Secure cloud storage authentication patterns
- Input sanitization guidelines
- Resource limit recommendations
- Documented DoS prevention strategies
**Static Analysis Results** (Sobelow)
- 0 high confidence warnings
- 0 medium confidence warnings
- 45 low confidence warnings (all documented as expected for data storage library)
### Breaking Changes
None. This is the first stable release, establishing the baseline API.
### Deprecations
None.
### Migration Guide
For users upgrading from v0.7.0:
1. Update dependency in `mix.exs`: `{:ex_zarr, "~> 1.0"}`
2. Run `mix deps.get`
3. No code changes required - full backward compatibility maintained
### Contributors
Special thanks to all contributors who made v1.0.0 possible through testing, feedback, and code contributions.
### Looking Forward
Planned for v1.1.0:
- Additional cloud storage backend optimizations
- Zarr v3 extension features (variable chunking, additional codecs)
- Performance improvements for large arrays
- Additional convenience functions for common patterns
---
## [0.7.0] - 2026-01-26
### Added
#### Chunk Streaming and Parallel Processing
- `Array.chunk_stream/2` - Stream chunks lazily with constant memory usage
- Sequential mode using `Stream.resource` for truly lazy evaluation
- Parallel mode with configurable concurrency (max 10 workers)
- Progress callback support for monitoring long operations
- Filter option to process subset of chunks
- Ordered and unordered streaming modes
- `Array.parallel_chunk_map/3` - Process chunks in parallel with custom mapper function
- Configurable concurrency and timeout
- Automatic error handling for failed tasks
- Integration with existing chunk cache and locking mechanisms
#### Custom Chunk Key Encoding
- `ChunkKey.Encoder` behavior - Define custom chunk naming schemes
- `encode/2` callback - Convert chunk index to string key
- `decode/2` callback - Parse string key back to chunk index
- `pattern/1` callback - Provide regex for key validation
- `ChunkKey.V2Encoder` and `ChunkKey.V3Encoder` - Default encoder implementations
- `ChunkKey.Registry` - Runtime encoder registration with Agent
- `ChunkKey.register_encoder/2` - Register custom encoders by name
- `ChunkKey.encode_with/3` and `decode_with/3` - Use registered encoders
- Centralized chunk key logic in S3 and GCS backends
#### Group Convenience Features
- Access behavior implementation - Use bracket notation for path-based access
- `group["experiments/exp1/results"]` syntax support
- Automatic intermediate group creation on write
- Works with `get_in`, `put_in`, `update_in` functions
- `Group.get_item/2` - Lazy load arrays and groups from storage
- Path-based access with forward slash separator
- Caches loaded items in memory
- Tracks checked paths to avoid redundant storage queries
- `Group.put_item/3` - Add items with auto-creation of parent groups
- `Group.remove_item/2` - Remove items from in-memory structure
- `Group.require_group/2` - Create group hierarchy like mkdir -p
- Returns existing group if present
- Creates all intermediate groups
- Returns error if path conflicts with array
- `Group.tree/2` - ASCII tree visualization of group hierarchy
- Box-drawing characters for structure (├── └──)
- Array [A] and group [G] markers
- Optional depth limiting
- Optional shape display
- `Group.batch_create/2` - Create multiple groups/arrays in parallel
- Concurrent metadata writes for cloud storage efficiency
- Mixed group and array creation support
- Up to 10 concurrent operations
### Fixed
#### Type Safety
- Fixed unmatched return value in `Array.chunk_stream` lock acquisition
- Fixed `ChunkCache.put` argument order in streaming code
- All dialyzer warnings resolved
#### Test Stability
- Memory efficiency test now uses sequential streaming for consistent results
- Changed from parallel to sequential mode
- Increased threshold to account for OTP version variance
- Uses `Enum.reduce` for truly lazy processing
- ArrayServer FIFO queue test uses staggered delays to ensure reliable ordering
- Added task-specific delays to guarantee queue order
- Uses Agent to track actual acquisition order
- Prevents race conditions in CI environments
#### Path Handling
- Fixed `Group.create_group/3` to use correct storage path
- Fixed `Group.create_array/3` to properly join storage path with array path
- Corrected filesystem metadata file locations
### Changed
#### Storage Backend Refactoring
- S3 and GCS backends now use centralized `ChunkKey.encode` function
- Eliminated duplicate chunk key encoding logic
- Simplified pattern matching with `ChunkKey.chunk_key_pattern`
#### Group Structure
- Added `_loaded` field to Group struct for lazy loading cache
- Type specification updated to include MapSet for loaded paths
### Testing
**Test Coverage**
- Total tests: 1246 (up from 794 in v0.5.0)
- New chunk streaming tests: 12 tests
- New chunk key encoding tests: 32 tests
- New group convenience tests: 37 tests
- Success rate: 100% passing (0 failures, 6 skipped)
- Quality checks: All passing (dialyzer, format checks)
**Test Files**
- `test/ex_zarr/chunk_streaming_test.exs` - Chunk iteration and parallel processing
- `test/ex_zarr/chunk_key_encoding_test.exs` - Custom encoder behavior and registry
- `test/ex_zarr/group_convenience_test.exs` - Group access and convenience features
## [0.5.0] - 2026-01-25
### Added
#### Metadata Serialization and Deserialization
- **Complete JSON encoding/decoding for Zarr v3 metadata**
- `MetadataV3.to_json/1` - Serialize metadata structures to JSON strings
- `MetadataV3.from_json/1` - Parse JSON to MetadataV3 structures
- Full support for chunk grids (regular and irregular)
- Full support for codec pipeline encoding (nested configurations)
- Full support for dimension names and storage transformers
- 54 new tests covering JSON serialization and zarr-python 3.x file compatibility
#### Format Conversion
- **Bidirectional conversion between Zarr v2 and v3 formats**
- `MetadataV3.from_v2/1` - Convert v2 metadata to v3 format
- `MetadataV3.to_v2/1` - Convert v3 metadata to v2 format (with validation)
- `ExZarr.FormatConverter` module - Convert entire arrays between formats
- `FormatConverter.convert/1` - Copy arrays with proper chunk key encoding
- `FormatConverter.check_v2_compatibility/1` - Pre-validate v3→v2 conversion
- Automatic handling of codec pipeline differences
- Clear error messages for incompatible features (sharding, irregular grids)
- 18 conversion tests including round-trip verification
#### S3 Storage Backend Enhancements
- **Localstack and minio support for local testing**
- Custom endpoint URL configuration
- Automatic parsing of `AWS_ENDPOINT_URL` environment variable
- ExAws configuration for S3-compatible services (scheme, host, port)
- 45 mock tests using Mox for fast CI/CD testing without AWS
- 31 integration tests with full localstack support
- Comprehensive testing guide (`test/ex_zarr/storage/S3_TESTING.md`)
- Example usage script (`examples/s3_storage.exs`)
#### Dependencies
- **sweet_xml ~> 0.7** - Required for ExAws.S3 XML parsing
### Fixed
#### Type Specifications
- **MetadataV3.from_v2/1 return type** - Removed unreachable `{:error, term()}` case
- Function always succeeds with v2 metadata input
- Updated spec to match success typing: `{:ok, t()}`
#### S3 Backend Configuration
- **ExAws configuration for S3-compatible services**
- Fixed endpoint URL parsing from single string to ExAws format
- Corrected credential handling for localstack/minio
- Fixed `build_ex_aws_config/2` to parse URI components properly
- Resolved test setup issues with ExUnit callback return values
### Changed
#### Test Organization
- **S3 integration tests** now properly skip when `AWS_ENDPOINT_URL` not configured
- **setup_all callback** returns proper values for ExUnit compatibility
- **Mock tests** run independently without requiring AWS services
#### Documentation
- **Updated S3 backend moduledoc** with localstack/minio configuration
- **Created S3_TESTING.md** with complete setup and troubleshooting guide
- **Updated examples** with S3 endpoint URL usage patterns
### Testing
**Test Coverage**
- **Total tests**: 794 (up from 482 in v0.4.0)
- **S3 mock tests**: 45 tests
- **S3 integration tests**: 31 tests
- **Format conversion tests**: 18 tests
- **JSON serialization tests**: 54 tests
- **Success rate**: 100% passing (0 failures, 3 skipped)
- **Quality checks**: All passing (format, credo strict, dialyzer)
### Technical Details
#### Format Conversion Limitations
- **v3 → v2 conversion** has known limitations detected by validation:
- Sharding codec not supported in v2 (rejected with clear error)
- Dimension names lost in conversion (documented)
- Irregular chunk grids not supported in v2 (rejected with clear error)
- Array→array codecs (transpose, quantize, bitround) dropped
- **v2 → v3 conversion** fully supported without limitations
#### S3 Configuration
- **Endpoint URL parsing** converts HTTP URL to ExAws format:
- `http://localhost:4566` → `scheme: "http://", host: "localhost", port: 4566`
- Credentials from environment: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
- Region configuration: default `us-east-1` or specified
- **Backward compatible** - works with real AWS S3 when endpoint not specified
### Breaking Changes
None - Full backward compatibility maintained with v0.1.0, v0.3.0, and v0.4.0
---
## [0.1.0] - 2026-01-23
### Added
#### Core Functionality
- **Complete array slicing implementation** with `get_slice` and `set_slice` functions
- **Chunked storage system** for efficient I/O and memory usage
- **N-dimensional array support** (1D to N-D) with optimized implementations for 1D and 2D
- **10 data types**: int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32, float64
- **Compression support**: zlib (fully working), with graceful fallbacks for zstd and lz4
- **Two storage backends**:
- Memory storage (using Agent for persistent state)
- Filesystem storage (Zarr v2 directory structure)
#### Validation and Safety
- **Comprehensive index validation** for all slicing operations
- **Bounds checking** to prevent out-of-bounds access
- **Data size validation** ensuring data matches slice dimensions
- **Type validation** for indices and data
- **Clear error messages** with actionable feedback
#### Interoperability
- **Full Zarr v2 specification compliance**
- **Bidirectional compatibility with zarr-python**
- **14 integration tests** verifying Python - Elixir compatibility
- **All data types work across implementations**
- **Metadata format compatibility**
#### Documentation
- Comprehensive module documentation with examples
- `INTEROPERABILITY.md` guide for multi-language workflows
- Interactive demo script (`examples/python_interop_demo.exs`)
- Integration test documentation (`test/support/README.md`)
- Python helper scripts for testing
#### Testing
- **196 tests** covering all functionality
- **21 property-based tests** using StreamData
- **35 validation tests** for bounds checking and error handling
- **19 slicing tests** for read/write operations
- **14 Python integration tests** for cross-language compatibility
- **100% passing** test suite with zero failures
#### Code Quality
- Passes all Credo checks (strict mode)
- Well-documented functions and modules
- Type specifications for public APIs
- Consistent error handling patterns
### Technical Details
#### Array Operations
- Row-major (C-order) data layout
- Lazy chunk loading (only loads needed chunks)
- Efficient slice extraction across chunk boundaries
- Partial chunk updates (read-modify-write)
- Fill value support for uninitialized regions
#### Performance Optimizations
- Dimension-specific implementations (1D, 2D, ND)
- Minimal data copying during operations
- Efficient binary pattern matching
- Agent-based memory storage for fast writes
#### Metadata
- Zarr v2 JSON metadata format
- Shape, chunks, dtype, compressor configuration
- Fill value preservation
- Automatic metadata generation
### Breaking Changes
None (initial release)
### Deprecated
None
### Fixed
- Memory storage now correctly persists writes using Agent
- Chunk boundary calculations work correctly for all dimensions
- Data size validation accounts for element size
- Index validation prevents invalid operations before I/O
### Security
- Input validation prevents buffer overflows
- Bounds checking prevents out-of-bounds access
- Type checking ensures data integrity
## [0.3.0] - 2026-01-24
### Added
#### Compression Codecs - Complete Implementation
**All Major Codecs via Zig NIFs**:
- **zstd** - Zstandard compression (native implementation via Zig NIF + libzstd)
- **lz4** - LZ4 fast compression (native implementation via Zig NIF + liblz4)
- **snappy** - Snappy compression (native implementation via Zig NIF + libsnappy)
- **blosc** - Blosc meta-compressor (native implementation via Zig NIF + libblosc)
- **bzip2** - Bzip2 compression (native implementation via Zig NIF + libbz2)
- **crc32c** - CRC32C checksum codec (pure Zig implementation, RFC 3720 compliant)
- **zlib** - Standard zlib compression (via Erlang `:zlib`, already present in v0.1.0)
- **none** - No compression option
**Codec Features**:
- High-performance native implementations using Zig NIFs
- Full compatibility with Python zarr's compression formats
- CRC32C uses Castagnoli polynomial (0x1EDC6F41) matching RFC 3720 specification
- Corruption detection and validation for CRC32C
- All codecs tested for Python interoperability
#### Custom Codec Plugin System
**Extensible Architecture**:
- **`ExZarr.Codecs.Codec` behavior** - Contract defining codec interface
- `codec_id/0` - Unique atom identifier
- `codec_info/0` - Metadata (name, version, type, description)
- `available?/0` - Runtime availability check
- `encode/2` - Compression/transformation function
- `decode/2` - Decompression/inverse transformation
- `validate_config/1` - Configuration validation
- **`ExZarr.Codecs.Registry` GenServer** - Dynamic codec management
- Runtime registration: `ExZarr.Codecs.register_codec/2`
- Runtime unregistration: `ExZarr.Codecs.unregister_codec/1`
- Codec queries: `list_codecs/0`, `available_codecs/0`, `codec_info/1`
- Force registration option for codec replacement
- Protection against unregistering built-in codecs
- **Application supervision tree** - `ExZarr.Application` with supervised registry
- Fault-tolerant codec registry under supervision
- Automatic recovery on crashes
- OTP-compliant architecture
**Plugin Capabilities**:
- Create compression codecs (like zlib, zstd)
- Create transformation codecs (like transpose, shuffle)
- Create checksum codecs (like crc32c)
- Register at runtime without recompiling ExZarr
- Seamless integration with built-in codecs
- Can be chained with other codecs
#### Examples and Documentation
**Example Codecs** (`examples/custom_codec_example.exs`):
- `UppercaseCodec` - Simple transformation codec demonstrating API
- `RleCodec` - Run-length encoding compression codec
- Complete usage demonstration with:
- Codec registration/unregistration
- Encoding and decoding operations
- Querying codec information
- Chaining custom and built-in codecs
**Documentation Updates**:
- Updated README with "Custom Codecs" section
- Code examples for creating custom codecs
- Usage patterns and best practices
- Updated compression codec list with all implementations
- Updated roadmap to reflect completed features
#### Testing
**Comprehensive Test Coverage**:
- **10 CRC32C tests** - Encoding, decoding, corruption detection, edge cases
- **29 custom codec tests** - Full plugin system coverage including:
- Behavior validation (`Codec.implements?/1`)
- Registry operations (register, unregister, list, info)
- Custom codec compression/decompression
- Availability checks
- Integration with built-in codecs
- Error handling for failing codecs
- Protection against invalid operations
**Total Test Suite**:
- **238 tests** (up from 196 in v0.1.0)
- **21 property-based tests** with 2,100+ generated test cases
- **100% passing** with 72.5% code coverage
- All codecs verified for Python zarr compatibility
### Changed
**Codec Module Refactoring**:
- Refactored `ExZarr.Codecs` to route through registry
- Split built-in codec logic into `compress_builtin/3` and `decompress_builtin/2`
- Updated `available_codecs/0` to dynamically query registry
- Updated `codec_available?/1` to check registry and call custom codec's `available?/0`
- Changed `@type codec` from fixed atom list to `atom()` for extensibility
**Mix Configuration**:
- Updated `mix.exs` to start `ExZarr.Application` with supervision tree
- Added application module configuration for codec registry initialization
**Documentation**:
- Updated `GAP_ANALYSIS.md` to reflect codec completion (v1.0 → v1.1)
- Updated compression performance comparison
- Updated test statistics
- Updated feature implementation status
### Technical Details
#### CRC32C Implementation
- Pure Zig implementation with 256-entry lookup table
- Castagnoli polynomial: 0x1EDC6F41 (RFC 3720)
- 4-byte overhead in little-endian format
- Compatible with Python zarr's `google-crc32c` library
- Validates data integrity on decode
- Returns error on checksum mismatch
#### Zig NIF Integration
- Uses Zigler 0.13 for seamless Zig-to-Elixir integration
- Automatic memory management via beam.allocator
- Proper error handling with Elixir result tuples `{:ok, data} | {:error, reason}`
- Platform-specific library linking (macOS, Linux)
- Post-compile RPATH fixing for library loading
#### Custom Codec Architecture
- GenServer-based registry pattern following OTP best practices
- ETS table for O(1) codec lookup performance
- Behavior contract ensures consistent codec API
- Support for both `:compression` and `:transformation` codec types
- Validation prevents registering invalid codecs
- Protection prevents unregistering built-in codecs
### Breaking Changes
None - Full backward compatibility maintained
### Deprecated
None
### Fixed
- Removed codec fallbacks - all codecs now have native implementations
- Fixed `available_codecs/0` to return dynamically registered codecs
- Fixed `codec_available?/1` to properly check custom codecs
### Performance
- Native Zig implementations provide significant performance improvements over fallbacks
- CRC32C table-driven algorithm for fast checksum computation
- GenServer registry with ETS backend for fast codec lookups
- Zero-copy binary operations where possible
### Security
- CRC32C detects data corruption and tampering
- Codec validation prevents malformed codec registration
- Input validation on all encode/decode operations
- Protection against unregistering critical built-in codecs
## [0.4.0] - 2026-01-25
### Added
#### Zarr v3 Specification Support
- **Complete Zarr v3 implementation** with full specification compliance
- **Python zarr 3.x interoperability** - bidirectional compatibility verified
- **16 Python v3 integration tests** covering all v3 features
- **Unified codec pipeline** - array→array, array→bytes, bytes→bytes stages
- **v3 metadata format** - `zarr.json` with `node_type`, `data_type`, unified `codecs`
- **v3 chunk key encoding** - slash-separated paths (`c/0/1/2`) with configurable encoding
- **Automatic version detection** - seamlessly works with both v2 and v3 arrays
- **Version-aware array operations** - smart routing based on zarr_format field
- **Group metadata support** - explicit group nodes with v3 format
- **Data type conversion utilities** - automatic mapping between v2 and v3 type systems
#### Python v3 Interoperability Testing
- **Comprehensive test suite** (`test/ex_zarr_v3_python_interop_test.exs`):
- 7 tests: Python 3.x → ExZarr v3 compatibility
- 6 tests: ExZarr v3 → Python 3.x compatibility
- 3 tests: v3 metadata compatibility
- 2 tests: v3 codec compatibility
- **Enhanced Python helper script** (`test/support/zarr_python_helper.py`):
- `check_zarr_version()` - Detects zarr-python version
- `create_v3_array()` - Creates v3 arrays
- `read_v3_array()` - Reads v3 arrays
- `verify_v3_array()` - Validates v3 arrays
- **Automatic test exclusion** - Tests tagged with `:python_v3` excluded when zarr-python 3.x not available
#### Documentation
- **`docs/V3_PYTHON_INTEROP.md`** - Comprehensive Python v3 testing guide
- **`docs/V3_PYTHON_INTEROP_FIX.md`** - Metadata save pattern documentation
- **`docs/V3_GZIP_AND_CODEC_CONFIG_FIX.md`** - Gzip format and codec configuration fixes
- **Migration guide** - v2 to v3 conversion patterns (in plan documentation)
### Changed
#### Gzip Codec Implementation
- **Fixed gzip format** - Now produces true gzip format (RFC 1952) with magic bytes `0x1F 0x8B`
- **Previous issue**: Was producing zlib/deflate format (RFC 1950) with magic bytes `0x78 0x9C`
- **New implementation** (`lib/ex_zarr/codecs/pipeline_v3.ex`):
- `gzip_compress/2` - Uses `:zlib.deflateInit/6` with `windowBits = 16 + 15`
- `gzip_decompress/1` - Uses `:zlib.inflateInit/2` with `windowBits = 16 + 15`
- **Python compatibility** - Gzip-compressed arrays now readable by zarr-python 3.x
#### Codec Configuration Serialization
- **Always include `configuration` key** in v3 codec specs (required by Zarr v3 spec)
- **Previous issue**: Was omitting `configuration` when empty, causing zarr-python validation errors
- **Fixed in** `lib/ex_zarr/storage.ex` - `encode_codecs_v3/1` function
- **Compliance**: Matches Zarr v3 specification requirement for codec metadata
#### Type Specifications
- **Fixed 12 dialyzer warnings** - Narrowed type specs to match success typing
- **Files updated**:
- `lib/ex_zarr/codecs/pipeline_v3.ex` - More specific error types
- `lib/ex_zarr/metadata_v3.ex` - Precise validation error tuples
- `lib/ex_zarr/data_type.ex` - Specific return types (`1 | 2 | 4 | 8` instead of `pos_integer()`)
- `lib/ex_zarr/version.ex` - Non-empty list indicators
- **Result**: Zero dialyzer errors
### Fixed
#### Python Interoperability Issues
- **Issue #101**: Python v3 interoperability testing implementation
- **Gzip format mismatch**: Fixed codec to produce RFC 1952 gzip format instead of RFC 1950 zlib
- **Missing configuration key**: Fixed codec metadata to always include `configuration` field
- **Metadata persistence**: Documented requirement for `ExZarr.save/2` after `ExZarr.create/1` for v3 arrays
#### Type System
- **Dialyzer compliance**: All type specifications now match inferred success types
- **More precise error types**: Better error reporting with specific error tuple shapes
- **Better tooling support**: IDE autocomplete and static analysis work correctly
### Technical Details
#### Zarr v3 Architecture
- **Version abstraction layer** (`lib/ex_zarr/version.ex`):
- `detect_version/1` - Automatic v2/v3 detection from metadata
- `default_version/0` - Configurable default (v3 by default)
- `supported_versions/0` - Lists supported versions
- **v3 metadata module** (`lib/ex_zarr/metadata_v3.ex`):
- Complete v3 metadata struct with validation
- `node_type` field for array vs group distinction
- `data_type` string format (e.g., "int32", "float64")
- `codecs` array with three-stage pipeline
- `chunk_grid` and `chunk_key_encoding` extensions
- **Chunk key encoding** (`lib/ex_zarr/chunk_key.ex`):
- v2: dot-separated (e.g., "0.1.2")
- v3: slash-separated with prefix (e.g., "c/0/1/2")
- `encode/2` and `decode/2` for version-aware conversion
- **Codec pipeline** (`lib/ex_zarr/codecs/pipeline_v3.ex`):
- Three-stage pipeline validation and execution
- Array→array codecs (filters/transforms)
- Array→bytes codec (required serializer)
- Bytes→bytes codecs (compression)
- Strict ordering enforcement per v3 spec
#### Gzip Format Details
- **windowBits parameter** in Erlang's `:zlib`:
- `15` = Zlib/Deflate format (RFC 1950)
- `16 + 15` = Gzip format (RFC 1952)
- `+16` modifier adds gzip wrapper (header + CRC32 trailer)
- **Magic bytes**:
- Zlib: `0x78 0x9C`
- Gzip: `0x1F 0x8B`
- **Compatibility**: Gzip format required by zarr-python 3.x
#### Testing
- **Total test count**: **482 tests** (up from 238 in v0.2.0)
- **New v3 tests**: 16 Python interoperability tests
- **Test exclusions**: Python tests automatically excluded without zarr-python 3.x
- **Success rate**: 100% passing (0 failures)
- **Dialyzer**: Zero type errors
### Breaking Changes
None - Full backward compatibility maintained with v2 arrays
### Planned Features
- S3 storage backend
- Parallel chunk operations
- Advanced indexing (fancy indexing, boolean indexing)
- Filter pipeline support (delta, quantize, shuffle)
- Additional codecs (lzma)
- v3 sharding extension
- v3 storage transformers
---
[0.5.0]: https://github.com/thanos/ex_zarr/releases/tag/v0.5.0
[0.4.0]: https://github.com/thanos/ex_zarr/releases/tag/v0.4.0
[0.3.0]: https://github.com/thanos/ex_zarr/releases/tag/v0.3.0
[0.1.0]: https://github.com/thanos/ex_zarr/releases/tag/v0.1.0