Skip to content

Release v0.1.4: Multiclass Support & Enhanced Tree Binning

Choose a tag to compare

@xRiskLab xRiskLab released this 03 Oct 21:15
· 26 commits to main since this release

Major Features

🎯 Multiclass WOE Encoding

  • One-vs-rest approach for targets with 3+ classes
  • Automatic detection of multiclass vs binary targets
  • Multiple output columns per feature for multiclass scenarios
  • Class-specific methods: predict_proba_class(), predict_ci_class()
  • Support for string/integer labels in multiclass scenarios

🌳 Enhanced Tree Binning

  • Decision tree as default binner for numerical features
  • Fixed NaN values in the last bin for numerical features
  • Optimized default parameters: max_depth=3, random_state=42
  • Unified binner_kwargs API for consistent parameter passing

Improvements

🔧 Bug Fixes

  • Fixed NaN values appearing in the last bin for numerical features
  • Improved error handling and validation
  • Enhanced type checking with comprehensive configuration

📚 Documentation & Examples

  • Updated README with multiclass examples and API documentation
  • Added comprehensive multiclass example (fastwoe_multiclass.py)
  • Enhanced type checking configuration for examples
  • Updated CHANGELOG with detailed release notes

🧪 Testing & Quality

  • Fixed all type checking issues across core library, tests, and examples
  • Added comprehensive test coverage for multiclass functionality
  • Enhanced CI/CD pipeline with proper type checking

Breaking Changes

  • Decision tree is now the default binner for numerical features (was KMeans)
  • Multiclass targets automatically detected and handled with one-vs-rest encoding

Migration Guide

If you were using KMeans binning explicitly, update your code:

# Old (explicit KMeans)
encoder = FastWoe(numerical_binner='kmeans')

# New (default decision tree, or explicit KMeans)
encoder = FastWoe()  # Uses decision tree by default
encoder = FastWoe(numerical_binner='kmeans')  # Still works

Performance Improvements

  • Faster numerical binning with optimized decision tree parameters
  • Improved memory efficiency for multiclass scenarios
  • Enhanced parallel processing for large datasets

Full Changelog: v0.1.3...v0.1.4