Conversation
Reviewer's GuideImplements WOE finetuning without rebucketing, adds asymptotic standard errors for Somers’ D/Gini with per-feature CI outputs, introduces instance-level Gini contribution decomposition and CAP-curve enhancements, updates docs/examples and test/CI tooling, and formally deprecates the old fast_somersd import path in favor of fastwoe.metrics. Sequence diagram for FastWoe.finetune WOE recalibrationsequenceDiagram
actor DataScientist
participant FastWoe as FastWoe_instance
participant Metrics as MetricsModule
DataScientist->>FastWoe: finetune(X_new, y_new, update_prior)
FastWoe->>FastWoe: check is_fitted_
FastWoe->>FastWoe: check is_multiclass_target
FastWoe->>FastWoe: X_df = _ensure_dataframe(X_new, use_fitted_names=True)
FastWoe->>FastWoe: y_ser = _ensure_series(y_new)
FastWoe->>FastWoe: validate_lengths_and_binary(y_ser)
FastWoe->>FastWoe: detect_unknown_cols(X_df, mappings_)
FastWoe-->>DataScientist: warnings for unknown_cols
alt update_prior_true
FastWoe->>FastWoe: y_prior_ = mean(y_ser)
FastWoe->>FastWoe: odds_prior_ = y_prior_ / (1 - y_prior_)
FastWoe-->>DataScientist: warning about_stale_features
end
FastWoe->>FastWoe: odds_prior = y_prior_ / (1 - y_prior_)
loop for each col in cols_to_update
FastWoe->>FastWoe: _recalibrate_feature(col, X_df, y_ser, odds_prior)
activate FastWoe
FastWoe->>FastWoe: binned_col = _apply_binning_to_column(X_df, col)
FastWoe->>FastWoe: old_mapping = mappings_[col]
FastWoe->>FastWoe: new_stats = groupby_bin(binned_col, y_ser)
FastWoe->>FastWoe: detect_missing_bins_and_extra_bins
FastWoe-->>DataScientist: warnings for_missing_or_extra_bins
FastWoe->>FastWoe: build new_mapping_df with updated counts and woe
FastWoe->>FastWoe: mappings_[col] = new_mapping_df
FastWoe->>FastWoe: enc = TargetEncoder(...)
FastWoe->>FastWoe: enc.fit(binned_col, y_ser)
FastWoe->>FastWoe: encoders_[col] = enc
FastWoe->>FastWoe: stats = _calculate_feature_stats(col, binned_col, y_ser, new_mapping_df)
FastWoe->>FastWoe: feature_stats_[col] = stats
deactivate FastWoe
end
FastWoe-->>DataScientist: return self_for_chaining
Class diagram for FastWoe finetuning and related metricsclassDiagram
class FastWoe {
+bool is_fitted_
+bool is_multiclass_target
+dict mappings_
+dict binners_
+dict binning_info_
+dict encoders_
+dict feature_stats_
+float y_prior_
+float odds_prior_
+dict encoder_kwargs
+int random_state
+list cat_features_
+fit(X, y, cat_features)
+transform(X)
+fit_transform(X, y)
+finetune(X_new, y_new, update_prior) FastWoe
+get_mapping(feature, class_label) pd.DataFrame
+get_feature_stats() pd.DataFrame
-_calculate_gini(y_true, y_pred) float
-_calculate_somersd_se(y_true, y_pred) float
-_calculate_woe_se(good_count, bad_count) float
-_calculate_woe_ci(woe, se) tuple
-_calculate_iv_standard_error(mapping_df, total_good, total_bad) float
-_calculate_iv_confidence_interval(iv_value, iv_se) tuple
-_calculate_feature_stats(col, X, y, mapping_df) dict
-_ensure_dataframe(X, use_fitted_names) pd.DataFrame
-_ensure_series(y) pd.Series
-_apply_binning_to_column(X, col) pd.Series
-_recalibrate_feature(col, X_new, y_new, odds_prior) None
-_bin_with_tree(col, X, y) dict
}
class MetricsModule {
<<utility>>
+somersd_yx(y, x) SomersDResult
+somersd_xy(y, x) SomersDResult
+somersd_pairwise(y_true, y_pred) SomersDResult
+somersd_clustered_matrix(df, score_col, ...) pd.DataFrame
+somersd_se(y_true, y_pred) float
+gini_contributions(scores, labels) tuple
-_somers_yx_core(y, x) tuple
-_somers_xy_core(y, x) tuple
-_concordant_discordant_matrices(CT) tuple
}
class PlotsModule {
<<utility>>
+plot_performance(y_true, y_pred, weights, ax, figsize, dpi, show_plot, labels, colors, top_p) tuple
}
FastWoe ..> MetricsModule : uses_somersd_se
FastWoe ..> MetricsModule : uses_somersd_yx
FastWoe ..> PlotsModule : user_level_integration
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- In
plot_performance, consider validatingtop_p(e.g., 0 < top_p <= 1) and raising a clear error for invalid values so users don’t get confusing axis behavior when passing out-of-range inputs. - The new
_apply_binning_to_columnduplicates bin-label construction logic that also appears infit/transform; factoring this into a shared helper (e.g.,format_bin_labels(bin_edges, method)) would reduce duplication and make it easier to keep label semantics consistent across code paths (includingfinetune).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `plot_performance`, consider validating `top_p` (e.g., 0 < top_p <= 1) and raising a clear error for invalid values so users don’t get confusing axis behavior when passing out-of-range inputs.
- The new `_apply_binning_to_column` duplicates bin-label construction logic that also appears in `fit`/`transform`; factoring this into a shared helper (e.g., `format_bin_labels(bin_edges, method)`) would reduce duplication and make it easier to keep label semantics consistent across code paths (including `finetune`).
## Individual Comments
### Comment 1
<location path="fastwoe/metrics.py" line_range="485-494" />
<code_context>
+def somersd_se(
</code_context>
<issue_to_address>
**issue (performance):** somersd_se builds a full contingency table and runs O(a*b) loops, which can explode for near-continuous inputs.
This builds a contingency table of shape `(len(unique(y_true)), len(unique(y_pred)))` and iterates over all cells, so for high-cardinality or continuous-like inputs it can approach O(n^2) time and memory. Since the docstring claims support for continuous targets, callers may hit this path with large arrays. Please either (a) document that this is intended for low-cardinality/binned inputs, and/or (b) guard against large `len(y_uniq) * len(x_uniq)` (e.g., return `np.nan` or similar), or (c) add an alternative path for continuous inputs that avoids quadratic behavior.
</issue_to_address>
### Comment 2
<location path="fastwoe/metrics.py" line_range="378-387" />
<code_context>
+def gini_contributions(
</code_context>
<issue_to_address>
**issue (bug_risk):** gini_contributions accepts arbitrary integer labels but assumes binary 0/1 semantics.
`labels` are cast to `int32` but never validated to be exactly {0,1}. If callers pass values like {−1,1}, booleans, or multi-class labels, the positive/negative logic and pair counts become incorrect while still returning a result. To prevent silent misuse, add an explicit check (e.g., via `np.unique(labels)` and comparing to `[0, 1]`) and raise a `ValueError` or at least emit a warning when the assumption is violated.
</issue_to_address>
### Comment 3
<location path="fastwoe/plots.py" line_range="44-47" />
<code_context>
show_plot: bool = True,
labels: Optional[list[str]] = None,
colors: Optional[list[str]] = None,
+ top_p: Optional[float] = None,
) -> tuple:
"""
</code_context>
<issue_to_address>
**issue (bug_risk):** top_p is not validated and can lead to confusing axes or errors for out-of-range values.
The docstring says `top_p` is in (0, 1], but the code doesn’t enforce this. For `top_p <= 0` or `top_p > 1`, the x-axis limits/ticks become misleading and break the CAP interpretation. Please validate `0 < top_p <= 1` and raise `ValueError` otherwise, and consider clamping `xlim` to [0, 1] instead of starting slightly negative to avoid showing a negative population fraction.
</issue_to_address>
### Comment 4
<location path="fastwoe/fastwoe.py" line_range="1154-1163" />
<code_context>
+ total_new = new_stats["count"].sum() if len(new_stats) > 0 else len(y_new)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Mixed use of old and new sample sizes makes `count_pct` inconsistent across bins after finetune.
For bins present in `new_stats` you recompute `count`/`count_pct` using `total_new`, but for bins missing in the new data you keep the original `count`/`count_pct`. After finetune this breaks consistency: `count` no longer sums to a coherent total, and `count_pct` no longer sums to 100% and mixes baselines. If `count_pct` is meant to reflect the finetune sample, it should be recomputed against `total_new` for all bins (while preserving original WOE/CI where needed), or you should expose separate columns like `orig_count_pct` and `finetune_count_pct`.
Suggested implementation:
```python
rows = []
# Use a single, consistent total for all bins when recomputing count/count_pct
total_new = int(new_stats["count"].sum()) if len(new_stats) > 0 else int(len(y_new))
for cat in old_mapping.index:
if cat in new_stats.index:
# Start from new stats but ensure count_pct is recomputed against total_new
row = new_stats.loc[cat].copy()
count = int(row["count"])
row["count_pct"] = count / total_new if total_new > 0 else 0.0
```
To fully enforce consistency:
1. In the `else` branch for `if cat in new_stats.index:` (the branch that handles bins missing in `new_stats` but present in `old_mapping`), ensure you also recompute `count_pct` using `total_new`:
- Use the preserved `count` for that bin (likely coming from `old_mapping`) and set `row["count_pct"] = count / total_new if total_new > 0 else 0.0` before appending it to `rows`.
2. If `count` for missing bins is currently taken directly from `old_mapping` without converting to `int`, align it with the above pattern (`count = int(...)`) so all `count` values across bins sum coherently to the same universe (the finetune sample or the chosen effective total).
3. If the public API or downstream consumers rely on the original baseline percentages, consider adding separate columns such as `orig_count_pct` (from the original mapping) and `finetune_count_pct` (the recomputed one above) and populating both during mapping construction.
</issue_to_address>
### Comment 5
<location path="fastwoe/fast_somersd.py" line_range="13-17" />
<code_context>
+
from .metrics import SomersDResult, somersd_pairwise, somersd_xy, somersd_yx
+warnings.warn(
+ "fastwoe.fast_somersd is deprecated. Import from fastwoe.metrics instead.",
+ DeprecationWarning,
+ stacklevel=2,
+)
+
</code_context>
<issue_to_address>
**suggestion:** Deprecation warning stacklevel is likely too low to point at the user’s import site.
Since this warning is raised at module import time, `stacklevel=2` will usually point at the top of `fast_somersd.py` rather than the caller’s import line. Consider increasing `stacklevel` (e.g., to 3 or 4) so it highlights the user’s `import fastwoe.fast_somersd` call instead.
```suggestion
warnings.warn(
"fastwoe.fast_somersd is deprecated. Import from fastwoe.metrics instead.",
DeprecationWarning,
stacklevel=4,
)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| def somersd_se( | ||
| y_true: np.ndarray, | ||
| y_pred: np.ndarray, | ||
| ) -> float: | ||
| """Asymptotic SE of Somers' D(X|Y) per Goktas & Oznur (2011). | ||
|
|
||
| Computes the asymptotic standard error from the contingency table of | ||
| (y_true, y_pred) using per-cell concordant/discordant counts and the | ||
| delta method for the ratio statistic. | ||
|
|
There was a problem hiding this comment.
issue (performance): somersd_se builds a full contingency table and runs O(a*b) loops, which can explode for near-continuous inputs.
This builds a contingency table of shape (len(unique(y_true)), len(unique(y_pred))) and iterates over all cells, so for high-cardinality or continuous-like inputs it can approach O(n^2) time and memory. Since the docstring claims support for continuous targets, callers may hit this path with large arrays. Please either (a) document that this is intended for low-cardinality/binned inputs, and/or (b) guard against large len(y_uniq) * len(x_uniq) (e.g., return np.nan or similar), or (c) add an alternative path for continuous inputs that avoids quadratic behavior.
| def gini_contributions( | ||
| scores: np.ndarray, | ||
| labels: np.ndarray, | ||
| ) -> tuple[np.ndarray, float]: | ||
| """Calculate each observation's contribution to the Gini coefficient. | ||
|
|
||
| Assigns a signed contribution to every observation based on how well it is | ||
| ranked relative to observations of the opposite class (Somers' D definition, | ||
| so ties receive zero credit). | ||
|
|
There was a problem hiding this comment.
issue (bug_risk): gini_contributions accepts arbitrary integer labels but assumes binary 0/1 semantics.
labels are cast to int32 but never validated to be exactly {0,1}. If callers pass values like {−1,1}, booleans, or multi-class labels, the positive/negative logic and pair counts become incorrect while still returning a result. To prevent silent misuse, add an explicit check (e.g., via np.unique(labels) and comparing to [0, 1]) and raise a ValueError or at least emit a warning when the assumption is violated.
| top_p: Optional[float] = None, | ||
| ) -> tuple: | ||
| """ | ||
| Plot model performance curve (CAP for binary, Power curve for continuous). |
There was a problem hiding this comment.
issue (bug_risk): top_p is not validated and can lead to confusing axes or errors for out-of-range values.
The docstring says top_p is in (0, 1], but the code doesn’t enforce this. For top_p <= 0 or top_p > 1, the x-axis limits/ticks become misleading and break the CAP interpretation. Please validate 0 < top_p <= 1 and raise ValueError otherwise, and consider clamping xlim to [0, 1] instead of starting slightly negative to avoid showing a negative population fraction.
| total_new = new_stats["count"].sum() if len(new_stats) > 0 else len(y_new) | ||
| for cat in old_mapping.index: | ||
| if cat in new_stats.index: | ||
| row = new_stats.loc[cat] | ||
| count = int(row["count"]) | ||
| bad_count = int(row["bad_count"]) | ||
| good_count = int(row["good_count"]) | ||
| event_rate = np.clip(float(row["event_rate"]), 1e-15, 1 - 1e-15) | ||
| odds_cat = event_rate / (1 - event_rate) | ||
| woe = float(np.log(odds_cat / odds_prior)) |
There was a problem hiding this comment.
suggestion (bug_risk): Mixed use of old and new sample sizes makes count_pct inconsistent across bins after finetune.
For bins present in new_stats you recompute count/count_pct using total_new, but for bins missing in the new data you keep the original count/count_pct. After finetune this breaks consistency: count no longer sums to a coherent total, and count_pct no longer sums to 100% and mixes baselines. If count_pct is meant to reflect the finetune sample, it should be recomputed against total_new for all bins (while preserving original WOE/CI where needed), or you should expose separate columns like orig_count_pct and finetune_count_pct.
Suggested implementation:
rows = []
# Use a single, consistent total for all bins when recomputing count/count_pct
total_new = int(new_stats["count"].sum()) if len(new_stats) > 0 else int(len(y_new))
for cat in old_mapping.index:
if cat in new_stats.index:
# Start from new stats but ensure count_pct is recomputed against total_new
row = new_stats.loc[cat].copy()
count = int(row["count"])
row["count_pct"] = count / total_new if total_new > 0 else 0.0To fully enforce consistency:
- In the
elsebranch forif cat in new_stats.index:(the branch that handles bins missing innew_statsbut present inold_mapping), ensure you also recomputecount_pctusingtotal_new:- Use the preserved
countfor that bin (likely coming fromold_mapping) and setrow["count_pct"] = count / total_new if total_new > 0 else 0.0before appending it torows.
- Use the preserved
- If
countfor missing bins is currently taken directly fromold_mappingwithout converting toint, align it with the above pattern (count = int(...)) so allcountvalues across bins sum coherently to the same universe (the finetune sample or the chosen effective total). - If the public API or downstream consumers rely on the original baseline percentages, consider adding separate columns such as
orig_count_pct(from the original mapping) andfinetune_count_pct(the recomputed one above) and populating both during mapping construction.
| warnings.warn( | ||
| "fastwoe.fast_somersd is deprecated. Import from fastwoe.metrics instead.", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) |
There was a problem hiding this comment.
suggestion: Deprecation warning stacklevel is likely too low to point at the user’s import site.
Since this warning is raised at module import time, stacklevel=2 will usually point at the top of fast_somersd.py rather than the caller’s import line. Consider increasing stacklevel (e.g., to 3 or 4) so it highlights the user’s import fastwoe.fast_somersd call instead.
| warnings.warn( | |
| "fastwoe.fast_somersd is deprecated. Import from fastwoe.metrics instead.", | |
| DeprecationWarning, | |
| stacklevel=2, | |
| ) | |
| warnings.warn( | |
| "fastwoe.fast_somersd is deprecated. Import from fastwoe.metrics instead.", | |
| DeprecationWarning, | |
| stacklevel=4, | |
| ) |
Summary
finetune()method: WOE recalibration without redevelopment - update one feature's likelihood ratios while keeping bin edges, other features, and the global prior frozensomersd_se()infastwoe.metrics: Works for binary, ordinal, and continuous targets; newsomersd_se,somersd_ci_lower,somersd_ci_upperfields in feature statsfastwoe.fast_somersd: Import fromfastwoe.metricsinsteadTest plan
Summary by Sourcery
Release version 0.1.7 with WOE finetuning, Somers' D standard error support, enhanced CAP plotting, instance-level Gini contributions, and expanded Python/CI compatibility.
New Features:
Enhancements:
Build:
CI:
Documentation:
Tests:
Chores: