# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [0.5.1] - 2025-12-28
### Changed
- Updated `crucible_framework` dependency from `~> 0.5.0` to `~> 0.5.2`
## [0.5.0] - 2025-12-27
### Changed
- **Normalized describe/1 to canonical schema format** - The `describe/1` callback now returns a schema conforming to the Crucible Stage contract specification v1.0.
- Changed `:stage` key to `:name` key (atom value)
- Added `__schema_version__: "1.0.0"` marker for schema evolution
- Added `required` field (list of required option keys)
- Added `optional` field (list of optional option keys)
- Added `types` field (type specifications for all options)
- Added `defaults` field (default values for optional options)
- Moved `metrics` list to `__extensions__.fairness.supported_metrics`
- Added `data_sources` and `output_location` to extensions
### Added
- **Conformance tests** - New `test/ex_fairness/conformance_test.exs` validates Stage contract compliance
- **Extended describe/1 tests** - Comprehensive tests for canonical schema format
### Dependencies
- Updated `crucible_framework` dependency to `~> 0.5.0` (required for new describe/1 contract)
## [0.4.0] - 2025-12-25
### Added
- **ExFairness.CrucibleStage** implementing `Crucible.Stage` for crucible_framework pipelines.
- Environment-specific config files (`config/*.exs`) to disable the CrucibleFramework repo by default.
- Documentation snapshot and gap analysis in `docs/20251225/`.
- Crucible stage integration test suite.
### Changed
- Refactored `ExFairness.evaluate/5` into smaller helpers for metric computation and violations.
- Improved chi-square computation structure in `ExFairness.Utils.StatisticalTests`.
- Updated project logo in `assets/ExFairness.svg`.
- Dependencies: add `crucible_framework`, update `crucible_ir` to `~> 0.2.0`, add `ecto_sql` and `postgrex`.
## [0.3.1] - 2025-11-26
### Added - CrucibleIR Integration
**Pipeline Stage:**
- **ExFairness.Stage** - Pipeline stage for Crucible framework integration
- Seamless integration with CrucibleIR experiment orchestration
- Accepts `CrucibleIR.Reliability.Fairness` configuration
- Extracts predictions, labels, and sensitive attributes from model outputs
- Supports all ExFairness metrics (demographic parity, equalized odds, equal opportunity, predictive parity, calibration)
- Configurable threshold and fail-on-violation behavior
- Comprehensive error handling and validation
- Returns structured fairness results with violations tracking
**Main API Enhancement:**
- **ExFairness.evaluate/5** - New function for CrucibleIR config-based evaluation
- Direct evaluation using `CrucibleIR.Reliability.Fairness` struct
- Optional probabilities parameter for calibration metrics
- Returns structured results with metrics, violations, and overall pass/fail status
- Conditionally compiled (only when crucible_ir is available)
### Configuration Support
The integration supports the following `CrucibleIR.Reliability.Fairness` structure:
```elixir
%CrucibleIR.Reliability.Fairness{
enabled: true, # Enable/disable fairness evaluation
metrics: [:demographic_parity, :equalized_odds, :equal_opportunity, :predictive_parity, :calibration],
group_by: :gender, # Sensitive attribute field name
threshold: 0.1, # Maximum acceptable disparity
fail_on_violation: false, # Whether to fail on violations
options: %{} # Additional metric-specific options
}
```
### Testing
**New Test Suite:**
- ExFairness.StageTest - 15 comprehensive tests
- Stage description validation
- Disabled fairness pass-through
- Single and multiple metric evaluation
- Calibration with/without probabilities
- Custom threshold configuration
- Violation detection and reporting
- Fail-on-violation behavior
- Invalid context handling
- Unknown metric handling
- Custom options pass-through
**Test Coverage:** 174 (v0.3.0) → 189 (v0.4.0) = +15 tests (+8.6%)
### Dependencies
**New Dependencies:**
- `{:crucible_ir, "~> 0.1.1"}` - CrucibleIR configuration structs
### Documentation
**Updated Documentation:**
- mix.exs - Added ExFairness.Stage to Pipeline module group
- README.md - Added Stage usage examples and CrucibleIR integration guide
- ExFairness.Stage - Comprehensive module documentation with examples
- ExFairness.evaluate/5 - Full API documentation with examples
### Quality Metrics
- **Zero compilation warnings** (enforced via warnings_as_errors)
- **Zero Dialyzer errors** (type-safe)
- **All tests passing** (189 total tests)
- **Backward compatible:** All v0.3.0 code works without modification
### Integration Benefits
1. **Seamless Crucible Integration**: ExFairness can now be used as a pipeline stage in Crucible experiments
2. **Standardized Configuration**: Uses CrucibleIR configuration structs for consistency
3. **Experiment Orchestration**: Fairness evaluation can be automated as part of experiment pipelines
4. **Flexible Violation Handling**: Choose whether fairness violations should fail experiments
5. **Comprehensive Results**: Structured output suitable for experiment reporting
### Example Usage
```elixir
# Configure fairness evaluation
config = %CrucibleIR.Reliability.Fairness{
enabled: true,
metrics: [:demographic_parity, :equalized_odds],
group_by: :gender,
threshold: 0.1,
fail_on_violation: false
}
# In a Crucible pipeline
context = %{
experiment: %{reliability: %{fairness: config}},
outputs: model_outputs # List of maps with :prediction, :label, :gender
}
{:ok, result_context} = ExFairness.Stage.run(context)
# result_context.fairness contains fairness evaluation results
# Or use the direct evaluation API
result = ExFairness.evaluate(predictions, labels, sensitive_attr, config)
# Returns %{metrics: ..., overall_passes: ..., violations: ...}
```
### Breaking Changes
**None** - This is a backward compatible release. All existing code continues to work unchanged.
### Migration from v0.3.0
No code changes required. The new CrucibleIR integration is opt-in and does not affect existing usage patterns.
## [0.3.0] - 2025-11-25
### Added - Statistical Inference and Calibration
**Statistical Inference Framework:**
- **ExFairness.Utils.Bootstrap** - Bootstrap confidence interval computation
- Stratified bootstrap to preserve group proportions
- Parallel and sequential computation modes
- Percentile and basic bootstrap methods
- Configurable number of samples (default: 1000)
- Reproducible with seed parameter
- GPU-accelerated metric computation via Nx.Defn
- **ExFairness.Utils.StatisticalTests** - Hypothesis testing for fairness metrics
- Two-proportion Z-test for demographic parity
- Chi-square test for equalized odds
- Permutation test for any fairness metric (non-parametric)
- Cohen's h effect size computation
- Configurable significance levels (default: α=0.05)
- Statistical interpretation generation
**Calibration Fairness Metric:**
- **ExFairness.Metrics.Calibration** - Calibration fairness for probability predictions
- Expected Calibration Error (ECE) computation
- Maximum Calibration Error (MCE) computation
- Uniform and quantile binning strategies
- Configurable number of bins (default: 10)
- Group-wise calibration comparison
- Validation for probability ranges [0, 1]
### Enhanced - Existing Metrics
All existing fairness metrics can now optionally include:
- Bootstrap confidence intervals
- Statistical hypothesis tests
- Effect size measures
- Enhanced interpretations with statistical significance
Example usage:
```elixir
result = ExFairness.demographic_parity(predictions, sensitive_attr,
include_ci: true, # NEW: Bootstrap CI
statistical_test: :z_test, # NEW: Hypothesis testing
bootstrap_samples: 1000, # NEW: Configurable bootstrap
confidence_level: 0.95 # NEW: CI level
)
# Returns enhanced result with :confidence_interval and :p_value
```
### Testing
**New Test Suites:**
- ExFairness.Utils.BootstrapTest - 11 comprehensive tests
- Bootstrap interval validation
- Stratified sampling verification
- Method comparison (percentile vs basic)
- Reproducibility testing
- Parallel vs sequential equivalence
- ExFairness.Utils.StatisticalTestsTest - 14 comprehensive tests
- Two-proportion Z-test validation
- Chi-square test verification
- Permutation test correctness
- Effect size computation
- P-value range validation
- ExFairness.Metrics.CalibrationTest - 15 comprehensive tests
- ECE/MCE computation validation
- Binning strategy verification
- Probability range validation
- Edge case handling
**Total Tests:** 134 (v0.2.0) → 174 (v0.3.0) = +40 tests (+30%)
### Documentation
**Design Documentation:**
- docs/20251125/enhancements_design.md (comprehensive 8-week implementation plan)
- Statistical inference algorithms and formulas
- Calibration metric mathematical foundation
- Implementation roadmap and success criteria
- API examples and migration guide
- Research citations (10+ additional papers)
**Updated Documentation:**
- mix.exs - Version updated to 0.3.0, new modules added to docs
- README.md - Version badge and installation instructions updated
- CHANGELOG.md - Complete v0.3.0 release notes
### Quality Metrics
- **Zero compilation warnings** (enforced via warnings_as_errors)
- **Zero Dialyzer errors** (type-safe)
- **Test coverage target:** >90% (expected)
- **Backward compatible:** All v0.2.0 code works without modification
### Performance
- Bootstrap: ~1-2 seconds for 1000 samples on standard metrics
- Permutation test: ~2-3 seconds for 10,000 permutations
- Parallel bootstrap: 4-8x speedup on multi-core systems
- Calibration: <100ms for typical datasets
### Research Foundations
**New Academic Citations:**
- Efron, B., & Tibshirani, R. J. (1994). "An introduction to the bootstrap." CRC press.
- Davison, A. C., & Hinkley, D. V. (1997). "Bootstrap methods and their application."
- Good, P. (2013). "Permutation tests: A practical guide to resampling methods."
- Agresti, A. (2018). "Statistical methods for the social sciences."
- Cohen, J. (1988). "Statistical power analysis for the behavioral sciences."
- Pleiss, G., et al. (2017). "On fairness and calibration." NeurIPS.
- Guo, C., et al. (2017). "On calibration of modern neural networks." ICML.
### Breaking Changes
**None** - This is a backward compatible release. All existing code continues to work unchanged.
### Migration from v0.2.0
No code changes required. All new features are opt-in via additional parameters.
See docs/20251125/enhancements_design.md for detailed migration examples.
## [0.2.0] - 2025-10-20
### Added - Comprehensive Technical Documentation
- **future_directions.md (1,941 lines)** - Complete roadmap to v1.0.0
- Detailed specifications for statistical inference
- Calibration metric with complete algorithm
- Intersectional analysis implementation plan
- Threshold optimization algorithm
- 6-month development timeline
- 12+ additional research citations
- **implementation_report.md (1,288 lines)** - Technical implementation details
- Module-by-module analysis of all 14 modules
- Algorithm documentation with pseudocode
- Design decisions and rationale
- Performance characteristics
- Code statistics and metrics
- **testing_and_qa_strategy.md (1,220 lines)** - QA methodology
- TDD philosophy and evidence
- Complete test coverage matrix (134 tests)
- Edge case testing strategy
- Future testing enhancements (property testing, integration testing)
- Quality gates and CI/CD specifications
### Enhanced - README.md
- Expanded from ~660 to 1,437 lines (+118%)
- Added **Mathematical Foundations** section (200+ lines)
- Complete mathematical definitions for all 4 metrics
- Formal probability notation
- Disparity measures
- Comprehensive citations with DOI numbers
- Added **Theoretical Background** section (300+ lines)
- Types of fairness (group, individual, causal)
- Measurement problem discussion
- Impossibility theorem with proof intuition
- Fairness-accuracy tradeoff analysis
- Added **Advanced Usage** section (200+ lines)
- Axon integration example (neural networks)
- Scholar integration example (classical ML)
- Batch fairness analysis
- Production monitoring with GenServer
- Expanded **Research Foundations** (150+ lines)
- 15+ peer-reviewed papers with full bibliographic details
- DOI numbers for all citations
- Framework comparisons (AIF360, Fairlearn, etc.)
- Added **API Reference** section
- Updated real-world use cases with legal compliance checks
### Documentation
- Total documentation: ~9,120 lines
- Academic citations: 27+ peer-reviewed papers
- Working code examples: 20+
- Integration patterns documented
## [0.1.0] - 2025-10-20
### Added - Core Implementation
**Infrastructure:**
- `ExFairness.Error` - Custom exception handling with type safety
- `ExFairness.Validation` - Comprehensive input validation
- Binary tensor validation
- Shape matching validation
- Multiple groups requirement (min 2 groups)
- Sufficient samples validation (default: 10 per group)
- Helpful error messages with actionable suggestions
- `ExFairness.Utils` - GPU-accelerated tensor operations
- `positive_rate/2` - Positive prediction rate with masking
- `create_group_mask/2` - Binary mask generation
- `group_count/2` - Sample counting per group
- `group_positive_rates/2` - Batch rate computation
- `ExFairness.Utils.Metrics` - Classification metrics
- `confusion_matrix/3` - TP, FP, TN, FN with masking
- `true_positive_rate/3` - TPR/Recall
- `false_positive_rate/3` - FPR
- `positive_predictive_value/3` - PPV/Precision
**Fairness Metrics:**
- `ExFairness.Metrics.DemographicParity` - P(Ŷ=1|A=0) = P(Ŷ=1|A=1)
- Configurable threshold (default: 0.1)
- Plain language interpretations
- Citations: Dwork et al. (2012), Feldman et al. (2015)
- `ExFairness.Metrics.EqualizedOdds` - Equal TPR and FPR across groups
- Both error rates checked
- Combined pass/fail determination
- Citations: Hardt et al. (2016)
- `ExFairness.Metrics.EqualOpportunity` - Equal TPR across groups
- Relaxed version of equalized odds
- Focus on false negative parity
- Citations: Hardt et al. (2016)
- `ExFairness.Metrics.PredictiveParity` - Equal PPV across groups
- Precision parity
- Consistent prediction meaning
- Citations: Chouldechova (2017)
**Detection Algorithms:**
- `ExFairness.Detection.DisparateImpact` - EEOC 80% rule
- Legal standard for adverse impact
- 4/5ths rule implementation
- Legal interpretation with EEOC context
- Citations: EEOC (1978), Biddle (2006)
**Mitigation Techniques:**
- `ExFairness.Mitigation.Reweighting` - Sample weighting for fairness
- Supports demographic parity and equalized odds targets
- Formula: w(a,y) = P(Y=y) / P(A=a,Y=y)
- Normalized weights (mean = 1.0)
- GPU-accelerated via Nx.Defn
- Citations: Kamiran & Calders (2012)
**Reporting System:**
- `ExFairness.Report` - Multi-metric fairness assessment
- Aggregate pass/fail counts
- Overall assessment generation
- Markdown export (human-readable)
- JSON export (machine-readable)
**Main API:**
- `ExFairness.demographic_parity/3` - Convenience function
- `ExFairness.equalized_odds/4` - Convenience function
- `ExFairness.equal_opportunity/4` - Convenience function
- `ExFairness.predictive_parity/4` - Convenience function
- `ExFairness.fairness_report/4` - Comprehensive reporting
### Testing
- 134 total tests (102 unit tests + 32 doctests)
- 100% pass rate
- Comprehensive edge case coverage
- Strict TDD approach (Red-Green-Refactor)
- All tests async (parallel execution)
### Quality Gates
- Zero compiler warnings (enforced)
- Zero Dialyzer errors (type-safe)
- Credo strict mode configured
- Code formatting enforced (100 char lines)
- ExCoveralls configured for coverage reports
### Documentation
- Comprehensive README.md with examples
- Complete module documentation (@moduledoc)
- Complete function documentation (@doc)
- Working examples (verified by doctests)
- Research citations in all metrics
- Mathematical definitions included
### Dependencies
- Production: `nx ~> 0.7` (only production dependency)
- Development: `ex_doc`, `dialyxir`, `excoveralls`, `credo`, `stream_data`, `jason`
---
[Unreleased]: https://github.com/North-Shore-AI/ExFairness/compare/v0.5.1...HEAD
[0.5.1]: https://github.com/North-Shore-AI/ExFairness/compare/v0.5.0...v0.5.1
[0.5.0]: https://github.com/North-Shore-AI/ExFairness/compare/v0.4.0...v0.5.0
[0.4.0]: https://github.com/North-Shore-AI/ExFairness/compare/v0.3.1...v0.4.0
[0.3.1]: https://github.com/North-Shore-AI/ExFairness/compare/v0.3.0...v0.3.1
[0.3.0]: https://github.com/North-Shore-AI/ExFairness/compare/v0.2.0...v0.3.0
[0.2.0]: https://github.com/North-Shore-AI/ExFairness/compare/v0.1.0...v0.2.0
[0.1.0]: https://github.com/North-Shore-AI/ExFairness/releases/tag/v0.1.0