Skip to content

Commit 1accc33

Browse files
committed
Release 0.1.6a3: Add MSD feature selection, module renaming
1 parent b5a1d99 commit 1accc33

File tree

9 files changed

+923
-1071
lines changed

9 files changed

+923
-1071
lines changed

CHANGELOG.md

Lines changed: 18 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,32 @@
11
# Changelog
22

3-
## Version 0.1.6a0 (2024-12-11)
3+
## Version 0.1.6a3 (2025-12-14)
44

5-
**Alpha Release: CAP Curves, Styled Display & Enhanced Metrics** 📊
6-
7-
This alpha release introduces powerful visualization and display capabilities for model performance analysis and WOE interpretation.
5+
**Alpha Release: CAP Curves, Styled Display, MSD Feature Selection & Enhanced Metrics** 📊
86

97
### ✨ New Features
10-
11-
#### 1. CAP Curve Visualization (`plot_performance`)
12-
- **Unified CAP Curves**: Single `plot_performance()` function for both binary (PD) and continuous (LGD) targets
13-
- **Multiple Model Support**: Plot and compare multiple models on the same chart
14-
- Pass list of predictions: `y_pred=[model1, model2, model3]`
15-
- Custom labels and colors: `labels=['Model A', 'Model B']`, `colors=['#69db7c', '#55d3ed']`
16-
- **Flexible Layout**: Accept external `matplotlib.Axes` for custom subplot grids
17-
- **Crystal Ball Line**: Perfect ranking baseline (blue) with "Crystal Ball" legend
18-
- **Professional Styling**: Arial font, consistent linewidths, dotted grid, clean aesthetics
19-
- **Weighted Gini Support**: Pass `weights` parameter for EAD-weighted calculations (LGD models)
20-
- **Returns**: `(fig, ax, gini)` tuple where `gini` is single value or list for multiple models
21-
22-
#### 2. Weighted Somers' D (`fast_somersd`)
23-
- **Numba-Optimized Weighted Implementation**: New `_somers_yx_weighted()` function
24-
- O(n²) weighted concordant/discordant pair calculation
25-
- Fully integrated into `somersd_yx()` function
26-
- No external dependencies (removed sklearn.roc_auc_score fallback)
27-
- **Unified API**: `somersd_yx(y, x, weights=None)` handles both weighted and unweighted cases
28-
- **Regulatory Compliance**: Supports EAD-weighted Gini for Basel/regulatory LGD models
29-
- **Performance**: Numba JIT compilation for efficient weighted calculations
30-
31-
#### 3. Rich HTML Display (`fastwoe.display`)
32-
- **Styled DataFrames**: Beautiful HTML tables for Jupyter notebooks
33-
- Clean baseline foundation design (light mode)
34-
- Inter font family, subtle gradients, alternating rows
35-
- Gradient highlighting for numeric columns (high/medium/low values)
36-
- Significance badges for statistical tests
37-
- **Decorator-Based Styling**: Clean, reusable code patterns
38-
- `@iv_styled`: Automatic IV analysis styling
39-
- `@styled(title, subtitle, highlight_cols, precision)`: Custom DataFrame styling
40-
- Functions return styled output seamlessly
41-
- **Pre-configured Functions**:
42-
- `style_iv_analysis(df)`: IV analysis with feature importance highlighting
43-
- `style_woe_mapping(df, feature_name)`: WOE transformations with category details
44-
- `StyledDataFrame(df, ...)`: Direct wrapper for any DataFrame
45-
- **Professional Design**: Based on baseline foundation design system
46-
- Consistent light mode colors (#FCFCFC, #F0F0F0, #E8E8E8)
47-
- 16px border radius for badges
48-
- Smooth transitions with cubic-bezier(0.32, 0.72, 0, 1)
49-
- Clean typography hierarchy
50-
51-
#### 4. WOE Visualization (`visualize_woe`)
52-
- **Dual Display Modes**:
53-
- `mode="probability"`: Show default probability deltas
54-
- `mode="log_odds"`: Show log-odds (WOE values)
55-
- **Horizontal Bar Charts**: Clear visualization of WOE impact per category
56-
- **Color Coding**: Positive (risk-increasing) vs negative (risk-decreasing) categories
57-
- **Baseline Reference**: Shows prior probability/log-odds as reference point
8+
- **CAP Curve Visualization** (`plot_performance`): Unified function for binary (PD) and continuous (LGD) targets with multi-model support
9+
- **Weighted Somers' D**: Numba-optimized weighted implementation for EAD-weighted Gini calculations
10+
- **Rich HTML Display** (`fastwoe.display`): Styled DataFrames for Jupyter with decorator-based styling (`@iv_styled`, `@styled`)
11+
- **WOE Visualization** (`visualize_woe`): Horizontal bar charts showing WOE impact per category
12+
- **Marginal Somers' D Feature Selection** (`marginal_somersd_selection`): Residual-based forward selection using rank correlation, works with both binary and continuous targets
13+
- **Somers' D Shapley Values** (`somersd_shapley`): Shapley value decomposition for feature contribution analysis
5814

5915
### 🔧 API Changes
16+
- New: `plot_performance()`, `visualize_woe()`, `StyledDataFrame()`, `style_iv_analysis()`, `style_woe_mapping()`
17+
- New: `marginal_somersd_selection()` in `fastwoe.screening` (renamed from `fastwoe.modeling`)
18+
- New: `somersd_shapley()` for Shapley value decomposition
19+
- Enhanced: `somersd_yx(y, x, weights=None)` now supports weighted calculations
20+
- Changed: `somersd_clustered_matrix()` now binary-only (raises `ValueError` for non-binary labels)
6021

61-
#### New Functions
62-
- `plot_performance(y_true, y_pred, weights=None, ax=None, labels=None, colors=None, figsize=(6,5), dpi=100, show_plot=True)`
63-
- `visualize_woe(woe_encoder, feature_name, mode='probability', figsize=(10, None), color_positive='#F783AC', color_negative='#A4D8FF', show_plot=True)`
64-
- `styled(title, subtitle, highlight_cols, precision)` - Decorator
65-
- `iv_styled` - Decorator for IV analysis
66-
- `style_iv_analysis(df)` - Function-based styling
67-
- `style_woe_mapping(df, feature_name)` - Function-based styling
68-
- `StyledDataFrame(df, title, subtitle, highlight_cols, precision)` - Direct wrapper
69-
70-
#### Enhanced Functions
71-
- `somersd_yx(y, x, weights=None)`: Now accepts optional `weights` parameter for weighted Somers' D calculation
72-
73-
#### Exports
74-
Updated `fastwoe/__init__.py` to export:
75-
- `plot_performance`, `visualize_woe` from `metrics`
76-
- `StyledDataFrame`, `style_iv_analysis`, `style_woe_mapping`, `styled`, `iv_styled` from `display`
77-
78-
### 📊 Examples & Documentation
79-
80-
#### New Notebooks
81-
- **`examples/fastwoe_cap_curve.ipynb`**: Comprehensive CAP curve demonstrations
82-
- Single model CAP curves (binary PD)
83-
- Multiple model comparison with custom colors
84-
- EAD-weighted Gini for LGD models
85-
- Side-by-side unweighted vs weighted comparisons
86-
- Continuous target (LGD) examples
87-
88-
- **`examples/fastwoe_styled_display.ipynb`**: Rich HTML display demonstrations
89-
- Decorator-based styling patterns
90-
- IV analysis with `@iv_styled`
91-
- Custom styled tables with `@styled`
92-
- Feature importance rankings
93-
- Model comparison tables
94-
- Risk segmentation analysis
95-
96-
#### New Documentation
97-
- **`WEIGHTED_SOMERSD_SUMMARY.md`**: Mathematical foundation and implementation details for weighted Somers' D
98-
99-
### 🐛 Bug Fixes
100-
- **Gini Calculation**: Corrected relationship between Somers' D and Gini (removed incorrect `2 *` multiplier for binary targets)
101-
- **Weighted Gini**: Removed sklearn.roc_auc_score dependency, implemented direct Numba-optimized weighted calculation
102-
- **Plot Layout**: Fixed `plot_performance` to generate single plot (removed unwanted second subplot)
103-
- **Return Values**: Corrected return signature to `(fig, ax, gini)` for single axis
104-
- **Perfect Line**: Fixed continuous target perfect line to sort by true target values (not straight line to (1,1))
105-
106-
### 🎨 Styling & Design
107-
- **Consistent CAP/Power Curve Styling**:
108-
- Arial font family for all text
109-
- Font size 12 for axis labels, 14 for titles, 10 for legend
110-
- Specific ticks: `np.arange(0, 1.1, 0.1)` for both axes
111-
- "Fraction of population" (x-axis), "Fraction of target" (y-axis)
112-
- "Crystal Ball" legend for perfect line (dodgerblue)
113-
- Black dotted random line
114-
- Default colors: `["#69db7c", "#55d3ed", "#ffa94d", "#c430c1", "#ff6b6b", "#4dabf7"]`
115-
- Default figsize: `(6, 5)`
116-
117-
- **HTML Table Styling**:
118-
- Light mode only (no dark mode mixing)
119-
- Inter font family
120-
- Subtle backgrounds and borders
121-
- Gradient highlighting with `!important` for proper rendering
122-
- 16px border radius for badges
123-
- Smooth hover transitions
124-
125-
### 🔧 Configuration
126-
- **Moved Sourcery Config**: Migrated `.sourcery.yaml` to `pyproject.toml` under `[tool.sourcery]` section
22+
### 📊 Documentation
23+
- Added: `docs/marginal_somersd_guide.md` - Comprehensive guide with algorithm flowchart and variance decomposition diagrams
24+
- Added: `examples/msd_feature_selection.ipynb` - Example notebook demonstrating MSD feature selection
12725

12826
### 📦 Dependencies
129-
- **Added**: `loguru>=0.7.0` for enhanced logging in tests
130-
- **Added**: `matplotlib>=3.5.0` (already in examples dependencies)
131-
132-
### 🧪 Testing
133-
- **Enhanced `test_fast_somersd.py`**:
134-
- Added `test_weighted_somersd()` to verify weighted implementation
135-
- Integrated `loguru` with `RichHandler` for better test output
136-
- All tests passing ✅
137-
138-
### 🚀 Installation
139-
This alpha version can be installed directly from the Git branch:
140-
141-
```bash
142-
# Install from alpha branch
143-
uv add "fastwoe @ git+https://github.com/xRiskLab/fastwoe.git@alpha-0.1.6a0"
144-
```
145-
146-
### ⚠️ Breaking Changes
147-
None - all changes are additive and backward compatible.
148-
149-
### 📝 Notes
150-
- This is an alpha release for testing new visualization and display features
151-
- Feedback welcome on styling, API design, and functionality
152-
- Stable release (0.1.6) will follow after testing period
27+
- Added: `loguru>=0.7.0`, `matplotlib>=3.5.0`
15328

154-
## Version 0.1.5 (2024-12-09)
29+
## Version 0.1.5 (2025-12-09)
15530

15631
**Performance Fix & Code Cleanup**: Eliminated DataFrame fragmentation warning and removed debug statements
15732

0 commit comments

Comments
 (0)