xRiskLab
diff --git a/‎CHANGELOG.md‎
Lines changed: 45 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 45 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 79 additions & 1 deletion b/‎README.md‎
Lines changed: 79 additions & 1 deletion
diff --git a/‎examples/fastwoe_example.py‎
Lines changed: 8 additions & 1 deletion b/‎examples/fastwoe_example.py‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎examples/fastwoe_explanation.ipynb‎
Lines changed: 42 additions & 33 deletions b/‎examples/fastwoe_explanation.ipynb‎
Lines changed: 42 additions & 33 deletions
@@ -1,6 +1,50 @@
 # Changelog
 
-## Version 0.1.3.post1 (Current)
+## Version 0.1.4 (Current)
+
+**Multiclass Support & Enhanced Tree Binning**: Major feature additions and API improvements
+
+- **New Features**:
+  - **Multiclass WOE Support**: Added one-vs-rest Weight of Evidence encoding for multiclass targets
+    - Automatic detection of multiclass targets (3+ unique values, not continuous proportions)
+    - One-vs-rest binary encoding for each class against all others
+    - Multiple output columns per feature: `feature_class_0`, `feature_class_1`, etc.
+    - Support for both integer and string class labels
+    - Class-specific priors stored in `y_prior_` dictionary
+  - **Enhanced Tree Binning**: Improved decision tree-based numerical feature binning
+    - Fixed NaN values in last bin issue with proper right-inclusive binning `(a, b]`
+    - Added `get_tree_estimator(feature)` method to access underlying scikit-learn trees
+    - Optimized default parameters for credit scoring: `max_depth=3`, `random_state=42`
+    - Simplified default tree parameters (removed `min_samples_leaf`, `min_samples_split`)
+  - **Unified Binner Parameters**: Streamlined API with single `binner_kwargs` parameter
+    - Replaced separate `tree_kwargs` and `faiss_kwargs` with unified approach
+    - Backward compatibility maintained for existing parameter names
+    - Cleaner API: `FastWoe(binning_method="tree", binner_kwargs={"max_depth": 2})`
+
+- **API Changes**:
+  - **Default Binning Method**: Changed from `"kbins"` to `"tree"` for numerical features
+  - **New Method**: `get_tree_estimator(feature)` to access fitted decision tree estimators
+  - **Enhanced Target Detection**: Automatic multiclass detection with `is_multiclass_target` attribute
+  - **Class Information**: Added `classes_` and `n_classes_` attributes for multiclass targets
+
+- **Fixed**:
+  - **Tree Binning NaN Bug**: Resolved issue where last bin always contained NaN values
+  - **Binning Logic**: Implemented proper right-inclusive binning `(a, b]` instead of `np.digitize`
+  - **Split Point Handling**: Improved `_create_bin_edges_from_splits` to handle duplicate splits
+  - **Test Coverage**: Added comprehensive tests for multiclass and tree binning edge cases
+
+- **Documentation & Examples**:
+  - **New Example**: `examples/fastwoe_multiclass.py` demonstrating multiclass WOE usage
+  - **Comprehensive Tests**: Added `TestMulticlassWoe` class with 9 test methods
+  - **Updated Documentation**: Clarified multiclass WOE concept and usage patterns
+
+- **Performance & Reliability**:
+  - **Credit Scoring Optimization**: Default tree parameters optimized for 4-8 bins per feature
+  - **Reproducible Results**: `random_state=42` as default for consistent binning
+  - **Memory Efficiency**: Improved handling of multiclass target encoding
+  - **Error Handling**: Enhanced validation for multiclass target types
+
+## Version 0.1.3.post1
 
 **Enhanced Statistical Analysis**: Added IV standard errors and Series support
 
 
@@ -8,13 +8,14 @@
 [![PyPI downloads](https://img.shields.io/pypi/dm/fastwoe.svg)](https://pypi.org/project/fastwoe/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
-FastWoe is a Python library for efficient **Weight of Evidence (WOE)** encoding of categorical features and statistical inference. It's designed for machine learning practitioners seeking robust, interpretable feature engineering and likelihood-ratio-based inference for binary classification problems.
+FastWoe is a Python library for efficient **Weight of Evidence (WOE)** encoding of categorical features and statistical inference. It's designed for machine learning practitioners seeking robust, interpretable feature engineering and likelihood-ratio-based inference for binary and multiclass classification problems.
 
 ![FastWoe](https://github.com/xRiskLab/fastwoe/raw/main/ims/title.png)
 
 ## 🌟 Key Features
 
 - **Fast WOE Encoding**: Leverages scikit-learn's `TargetEncoder` for efficient computation
+- **Multiclass Support**: One-vs-rest WOE encoding for targets with 3+ classes
 - **Statistical Confidence Intervals**: Provides standard errors and confidence intervals for WOE values
 - **IV Standard Errors**: Statistical significance testing for Information Value with confidence intervals
 - **Cardinality Control**: Built-in preprocessing to handle high-cardinality categorical features
@@ -136,6 +137,83 @@ print("\nWOE Mapping for 'category':")
 print(mapping[['category', 'count', 'event_rate', 'woe', 'woe_se']])
 ```
 
+## 🎯 Multiclass Support
+
+FastWoe now supports **multiclass classification** using a one-vs-rest approach! For targets with 3+ classes, FastWoe automatically creates separate WOE encodings for each class against all others.
+
+### Multiclass Example
+
+```python
+import pandas as pd
+import numpy as np
+from fastwoe import FastWoe
+from sklearn.ensemble import RandomForestClassifier
+from sklearn.metrics import classification_report
+
+# Create multiclass data
+X = pd.DataFrame({
+    'job': ['teacher', 'engineer', 'artist', 'doctor'] * 25,
+    'age_group': ['<30', '30-50', '50+'] * 33 + ['<30'],
+    'income': np.random.normal(50000, 20000, 100),
+})
+y = pd.Series([0, 1, 2, 0, 1] * 20)  # 3 classes
+
+# Fit FastWoe with multiclass target
+woe_encoder = FastWoe()
+woe_encoder.fit(X, y)
+
+# Transform data - creates multiple columns per feature
+X_woe = woe_encoder.transform(X)
+print(f"Original features: {X.shape[1]}")
+print(f"WOE features: {X_woe.shape[1]}")  # 3x more columns
+print(f"Column names: {list(X_woe.columns)}")
+# Output: ['job_class_0', 'job_class_1', 'job_class_2', 'age_group_class_0', ...]
+
+# Get probabilities for all classes
+probs = woe_encoder.predict_proba(X)
+print(f"Probabilities shape: {probs.shape}")  # (n_samples, n_classes)
+
+# Get class-specific probabilities
+class_0_probs = woe_encoder.predict_proba_class(X, class_label=0)
+class_1_probs = woe_encoder.predict_proba_class(X, class_label=1)
+
+# Get confidence intervals for specific class
+class_0_ci = woe_encoder.predict_ci_class(X, class_label=0)
+print(f"Class 0 CI shape: {class_0_ci.shape}")  # (n_samples, 2) [lower, upper]
+
+# Train a classifier on WOE features
+rf = RandomForestClassifier(n_estimators=100, random_state=42)
+rf.fit(X_woe, y)
+predictions = rf.predict(X_woe)
+
+print("\nClassification Report:")
+print(classification_report(y, predictions))
+```
+
+### Multiclass Features
+
+- **One-vs-Rest Encoding**: Each class gets separate WOE scores against all others
+- **Class-Specific Methods**: `predict_proba_class()` and `predict_ci_class()` for individual classes
+- **Softmax Probabilities**: `predict_proba()` returns probabilities that sum to 1 across classes
+- **Comprehensive Statistics**: All existing methods work with multiclass (IV analysis, feature stats, etc.)
+- **String Labels**: Supports both integer and string class labels
+
+### Class-Specific Predictions
+
+```python
+# Method 1: Extract from full results
+all_probs = woe_encoder.predict_proba(X)
+class_0_probs = all_probs[:, 0]  # Extract class 0
+
+# Method 2: Use class-specific methods (recommended)
+class_0_probs = woe_encoder.predict_proba_class(X, class_label=0)
+class_0_ci = woe_encoder.predict_ci_class(X, class_label=0)
+
+# Practical usage examples
+high_risk_mask = woe_encoder.predict_proba_class(X, class_label=0) > 0.5
+high_confidence_mask = woe_encoder.predict_ci_class(X, class_label=2)[:, 0] > 0.3
+```
+
 ## 🔧 Advanced Usage
 
 > [!CAUTION]
 
@@ -12,12 +12,19 @@
 import numpy as np
 import pandas as pd
 import seaborn as sns
-import statsmodels.api as sm
 from scipy import stats
 from sklearn.linear_model import LogisticRegression
 from sklearn.metrics import roc_auc_score
 from sklearn.model_selection import train_test_split
 
+try:
+    import statsmodels.api as sm
+
+    STATSMODELS_AVAILABLE = True
+except ImportError:
+    STATSMODELS_AVAILABLE = False
+    print("Warning: statsmodels not available. Some features will be disabled.")
+
 warnings.filterwarnings("ignore")
 
 # Set style
 
@@ -95,7 +95,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -272,21 +272,24 @@
     "\n",
     "# Print results\n",
     "print(f\"\\nExplanation for sample {idx}:\")\n",
-    "print(f\"True label: {explanation['true_label']}\")\n",
-    "print(f\"Predicted label: {explanation['predicted_label']}\")\n",
-    "print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
-    "print(f\"Interpretation: {explanation['interpretation']}\")\n",
-    "\n",
-    "# Show feature contributions\n",
-    "if \"feature_contributions\" in explanation:\n",
-    "    print(\"\\nFeature contributions:\")\n",
-    "    for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
-    "        print(f\"  {feature}: {woe_val:.3f}\")"
+    "if explanation is not None:\n",
+    "    print(f\"True label: {explanation['true_label']}\")\n",
+    "    print(f\"Predicted label: {explanation['predicted_label']}\")\n",
+    "    print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
+    "    print(f\"Interpretation: {explanation['interpretation']}\")\n",
+    "\n",
+    "    # Show feature contributions\n",
+    "    if \"feature_contributions\" in explanation:\n",
+    "        print(\"\\nFeature contributions:\")\n",
+    "        for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
+    "            print(f\"  {feature}: {woe_val:.3f}\")\n",
+    "else:\n",
+    "    print(\"No explanation available\")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -463,21 +466,24 @@
     "\n",
     "# Print results\n",
     "print(f\"\\nExplanation for sample {idx}:\")\n",
-    "print(f\"True label: {explanation['true_label']}\")\n",
-    "print(f\"Predicted label: {explanation['predicted_label']}\")\n",
-    "print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
-    "print(f\"Interpretation: {explanation['interpretation']}\")\n",
-    "\n",
-    "# Show feature contributions\n",
-    "if \"feature_contributions\" in explanation:\n",
-    "    print(\"\\nFeature contributions:\")\n",
-    "    for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
-    "        print(f\"  {feature}: {woe_val:.3f}\")"
+    "if explanation is not None:\n",
+    "    print(f\"True label: {explanation['true_label']}\")\n",
+    "    print(f\"Predicted label: {explanation['predicted_label']}\")\n",
+    "    print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
+    "    print(f\"Interpretation: {explanation['interpretation']}\")\n",
+    "\n",
+    "    # Show feature contributions\n",
+    "    if \"feature_contributions\" in explanation:\n",
+    "        print(\"\\nFeature contributions:\")\n",
+    "        for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
+    "            print(f\"  {feature}: {woe_val:.3f}\")\n",
+    "else:\n",
+    "    print(\"No explanation available\")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
@@ -654,16 +660,19 @@
     "\n",
     "# Print results\n",
     "print(f\"\\nExplanation for sample {idx}:\")\n",
-    "print(f\"True label: {explanation['true_label']}\")\n",
-    "print(f\"Predicted label: {explanation['predicted_label']}\")\n",
-    "print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
-    "print(f\"Interpretation: {explanation['interpretation']}\")\n",
-    "\n",
-    "# Show feature contributions\n",
-    "if \"feature_contributions\" in explanation:\n",
-    "    print(\"\\nFeature contributions:\")\n",
-    "    for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
-    "        print(f\"  {feature}: {woe_val:.3f}\")"
+    "if explanation is not None:\n",
+    "    print(f\"True label: {explanation['true_label']}\")\n",
+    "    print(f\"Predicted label: {explanation['predicted_label']}\")\n",
+    "    print(f\"WOE Evidence: {explanation['total_woe']:.3f}\")\n",
+    "    print(f\"Interpretation: {explanation['interpretation']}\")\n",
+    "\n",
+    "    # Show feature contributions\n",
+    "    if \"feature_contributions\" in explanation:\n",
+    "        print(\"\\nFeature contributions:\")\n",
+    "        for feature, woe_val in explanation[\"feature_contributions\"].items():\n",
+    "            print(f\"  {feature}: {woe_val:.3f}\")\n",
+    "else:\n",
+    "    print(\"No explanation available\")"
    ]
   },
   {