Skip to content

Commit 160883a

Browse files
committed
Release v0.2.7: CatBoost standardization and PyPI workflow
- Standardize CatBoost to use XAddEvidence (matching XGBoost/LightGBM) - Fix README depth=1 documentation (recommended, not required) - Add PyPI publish workflow with trusted publishing - Update version from 0.2.7rc2 to 0.2.7 (stable) - All 106 tests passing
1 parent 1640044 commit 160883a

File tree

11 files changed

+171
-50
lines changed

11 files changed

+171
-50
lines changed

.github/workflows/publish.yml

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
name: Build and Publish
2+
3+
on:
4+
release:
5+
types: [published]
6+
workflow_dispatch:
7+
inputs:
8+
publish_to_pypi:
9+
description: 'Publish to PyPI'
10+
required: true
11+
type: boolean
12+
default: false
13+
14+
jobs:
15+
build:
16+
name: Build distribution packages
17+
runs-on: ubuntu-latest
18+
19+
steps:
20+
- name: Checkout code
21+
uses: actions/checkout@v4
22+
23+
- name: Set up Python
24+
uses: actions/setup-python@v5
25+
with:
26+
python-version: '3.11'
27+
28+
- name: Install build dependencies
29+
run: |
30+
python -m pip install --upgrade pip
31+
pip install build hatch-autorun twine
32+
33+
- name: Build package
34+
run: python -m build
35+
36+
- name: Check distribution files
37+
run: twine check dist/*
38+
39+
- name: List distribution files
40+
run: ls -lh dist/
41+
42+
- name: Upload artifacts
43+
uses: actions/upload-artifact@v4
44+
with:
45+
name: python-package-distributions
46+
path: dist/
47+
retention-days: 30
48+
49+
publish-pypi:
50+
name: Publish to PyPI
51+
needs: build
52+
runs-on: ubuntu-latest
53+
if: |
54+
(github.event_name == 'workflow_dispatch' && github.event.inputs.publish_to_pypi == 'true') ||
55+
(github.event_name == 'release' && !github.event.release.prerelease)
56+
57+
environment:
58+
name: pypi
59+
url: https://pypi.org/p/xbooster
60+
61+
permissions:
62+
id-token: write
63+
64+
steps:
65+
- name: Download artifacts
66+
uses: actions/download-artifact@v4
67+
with:
68+
name: python-package-distributions
69+
path: dist/
70+
71+
- name: Publish to PyPI
72+
uses: pypa/gh-action-pypi-publish@release/v1
73+
74+
create-github-release-notes:
75+
name: Create GitHub Release Notes
76+
needs: build
77+
runs-on: ubuntu-latest
78+
if: github.event_name == 'release'
79+
80+
permissions:
81+
contents: write
82+
83+
steps:
84+
- name: Download artifacts
85+
uses: actions/download-artifact@v4
86+
with:
87+
name: python-package-distributions
88+
path: dist/
89+
90+
- name: Install GitHub CLI
91+
run: |
92+
type -p curl >/dev/null || (apt update && apt install curl -y)
93+
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \
94+
&& chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \
95+
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
96+
&& apt update \
97+
&& apt install gh -y
98+
99+
- name: Upload to GitHub Release
100+
env:
101+
GITHUB_TOKEN: ${{ github.token }}
102+
run: |
103+
gh release upload '${{ github.ref_name }}' dist/** --repo '${{ github.repository }}'

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,30 @@
11
# Changelog
22

3+
## [0.2.7] - 2025-12-04
4+
5+
### Changed
6+
- **CatBoost Naming Standardization**: Replaced `LeafValue` with `XAddEvidence` throughout CatBoost implementation
7+
- Standardized naming to match XGBoost and LightGBM implementations
8+
- Updated all CatBoost-related files: `catboost_scorecard.py`, `catboost_wrapper.py`, `cb_constructor.py`
9+
- Updated all tests and documentation
10+
11+
### Fixed
12+
- **README Documentation**: Corrected CatBoost depth requirement statement
13+
- Changed from "Only supports depth=1" to "depth=1 is recommended for better interpretability"
14+
- Code actually supports any tree depth (as long as trees are complete binary)
15+
- Updated code examples to use `XAddEvidence` instead of `LeafValue`
16+
17+
### Added
18+
- **PyPI Publish Workflow**: Added automated PyPI publishing workflow (`.github/workflows/publish.yml`)
19+
- Supports both release events and manual workflow dispatch
20+
- Uses trusted publishing (OpenID Connect) for secure PyPI uploads
21+
- Automatically uploads distribution files to GitHub releases
22+
23+
### Technical Details
24+
- All 106 tests passing
25+
- Version updated from 0.2.7rc2 to 0.2.7 (stable release)
26+
- LightGBM support is now stable (previously release candidate)
27+
328
## [0.2.7rc2] - 2025-11-23 (Release Candidate)
429

530
### Fixed

README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ The `DataPreprocessor` provides:
218218
3. Generation of interaction constraints for XGBoost
219219
4. Consistent feature naming for scorecard generation
220220

221-
### LightGBM Support 💡 (Release Candidate)
221+
### LightGBM Usage
222222

223223
xbooster provides support for LightGBM models with scorecard functionality. Here's how to use it:
224224

@@ -296,12 +296,11 @@ print(f"Model Gini: {model_gini:.4f}")
296296
- **Flexible**: `use_base_score` parameter for optional base score normalization
297297

298298
**Important Notes:**
299-
- **Release Candidate**: This feature is in testing phase - feedback welcome!
300299
- LightGBM's sklearn API handles base_score differently than XGBoost
301300
- The `use_base_score=True` parameter (default) ensures proper normalization
302301
- Only `XAddEvidence` score type is supported (WOE not applicable)
303302

304-
### CatBoost Support 🐱 (Beta)
303+
### CatBoost Usage
305304

306305
xbooster provides experimental support for CatBoost models with reduced functionality compared to XGBoost. Here's how to use it:
307306

@@ -340,19 +339,19 @@ model = CatBoostClassifier(
340339
model.fit(pool)
341340

342341
# Create and fit the scorecard constructor
343-
constructor = CatBoostScorecardConstructor(model, pool) # use_woe=False is the default, using raw LeafValue
342+
constructor = CatBoostScorecardConstructor(model, pool) # use_woe=False is the default, using raw XAddEvidence
344343

345-
# Alternatively, to use WOE values instead of raw leaf values:
344+
# Alternatively, to use WOE values instead of raw XAddEvidence:
346345
# constructor = CatBoostScorecardConstructor(model, pool, use_woe=True)
347346

348347
# Construct the scorecard
349348
scorecard = constructor.construct_scorecard()
350349
print("\nScorecard:")
351350
print(scorecard.head(3))
352351

353-
# Print raw leaf values
354-
print("\nRaw Leaf Values:")
355-
print(scorecard[["Tree", "LeafIndex", "LeafValue", "WOE"]].head(10))
352+
# Print raw XAddEvidence values
353+
print("\nRaw XAddEvidence Values:")
354+
print(scorecard[["Tree", "LeafIndex", "XAddEvidence", "WOE"]].head(10))
356355

357356
# Make predictions using different methods - Do this BEFORE creating points
358357
# Original CatBoost predictions
@@ -410,7 +409,7 @@ visualizer.plot_tree(tree_idx=0, title="CatBoost Tree Visualization")
410409

411410
The CatBoost implementation has some limitations compared to the XGBoost version:
412411

413-
1. Only supports depth=1 trees for interpretability
412+
1. **Depth recommendation**: While the code supports any tree depth (as long as trees are complete binary), `depth=1` is recommended for better interpretability. Deeper trees work but may be harder to interpret.
414413
2. Limited support for categorical features
415414
3. No SQL query generation
416415
4. Reduced visualization options

tests/test_catboost_scorecard.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,9 @@ def test_trees_to_scorecard(trained_model: CatBoostClassifier, test_pool: Pool):
6060
"NonEvents",
6161
"Events",
6262
"EventRate",
63-
"LeafValue",
63+
"XAddEvidence",
6464
"WOE",
6565
"IV",
66-
"xAddEvidence",
6766
"CountPct",
6867
"DetailedSplit",
6968
}
@@ -77,10 +76,9 @@ def test_trees_to_scorecard(trained_model: CatBoostClassifier, test_pool: Pool):
7776
assert scorecard["NonEvents"].dtype == np.float64
7877
assert scorecard["Events"].dtype == np.float64
7978
assert scorecard["EventRate"].dtype == np.float64
80-
assert scorecard["LeafValue"].dtype == np.float64
79+
assert scorecard["XAddEvidence"].dtype == np.float64
8180
assert scorecard["WOE"].dtype == np.float64
8281
assert scorecard["IV"].dtype == np.float64
83-
assert scorecard["xAddEvidence"].dtype == np.float64
8482
assert scorecard["CountPct"].dtype == np.float64
8583

8684
# Check for valid values

tests/test_catboost_wrapper.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -165,19 +165,19 @@ def test_get_binned_feature_table(woe_mapper):
165165
assert isinstance(table, pd.DataFrame)
166166
assert not table.empty
167167

168-
required_columns = {"Feature", "Condition", "LeafValue", "Weight", "TreeCount"}
168+
required_columns = {"Feature", "Condition", "XAddEvidence", "Weight", "TreeCount"}
169169
assert set(table.columns) >= required_columns
170170

171171
assert table["Feature"].dtype == object
172172
assert table["Condition"].dtype == object
173-
assert table["LeafValue"].dtype == np.float64
173+
assert table["XAddEvidence"].dtype == np.float64
174174
assert table["Weight"].dtype == np.float64
175175
assert table["TreeCount"].dtype == np.int64
176176

177177

178178
def test_get_value_column(woe_mapper):
179179
"""Test the get_value_column method."""
180-
assert woe_mapper.get_value_column() == "LeafValue"
180+
assert woe_mapper.get_value_column() == "XAddEvidence"
181181

182182
woe_mapper.points_column = "Points"
183183
assert woe_mapper.get_value_column() == "Points"

tests/test_cb_constructor.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ def test_construct_scorecard(scorecard_constructor):
270270
"IV",
271271
"CountPct",
272272
"DetailedSplit",
273-
"LeafValue",
273+
"XAddEvidence",
274274
}
275275
assert set(scorecard.columns).issuperset(required_columns)
276276

@@ -288,7 +288,7 @@ def test_construct_scorecard(scorecard_constructor):
288288
assert scorecard["IV"].dtype == np.float64
289289
assert scorecard["CountPct"].dtype == np.float64
290290
assert scorecard["DetailedSplit"].dtype == object
291-
assert scorecard["LeafValue"].dtype == np.float64
291+
assert scorecard["XAddEvidence"].dtype == np.float64
292292

293293
# Check for valid values
294294
assert scorecard["Count"].min() >= 0
@@ -453,7 +453,7 @@ def test_get_scorecard(trained_model, catboost_pool):
453453
assert not scorecard.empty
454454
assert "Tree" in scorecard.columns
455455
assert "LeafIndex" in scorecard.columns
456-
assert "LeafValue" in scorecard.columns
456+
assert "XAddEvidence" in scorecard.columns
457457
assert "DetailedSplit" in scorecard.columns
458458

459459

xbooster/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
from gradient boosted tree models (XGBoost and CatBoost).
66
"""
77

8-
__version__ = "0.2.7rc2"
8+
__version__ = "0.2.7"
99
__author__ = "xRiskLab"
1010
__email__ = "contact@xrisklab.ai"
1111

xbooster/_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ def build_node(path: str, level: int) -> dict:
442442
f"count: {int(row['Count'])}\n"
443443
f"rate: {row['EventRate']:.3f}\n"
444444
f"woe: {row['WOE']:.3f}\n"
445-
f"val: {row['LeafValue']:.3f}"
445+
f"val: {row['XAddEvidence']:.3f}"
446446
),
447447
"depth": level,
448448
"is_leaf": True,

xbooster/catboost_scorecard.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ def trees_to_scorecard(
225225
{
226226
"Tree": tree_idx,
227227
"LeafIndex": leaf_idx,
228-
"LeafValue": clean_val,
228+
"XAddEvidence": clean_val,
229229
"Conditions": conditions,
230230
"Feature": feature,
231231
"Sign": sign,
@@ -304,9 +304,6 @@ def trees_to_scorecard(
304304
.round(4)
305305
)
306306

307-
# Calculate xAddEvidence
308-
scorecard_df["xAddEvidence"] = scorecard_df["LeafValue"]
309-
310307
# Calculate CountPct
311308
total_count = scorecard_df["Count"].sum()
312309
scorecard_df["CountPct"] = (scorecard_df["Count"] / total_count * 100).fillna(0.0)
@@ -329,10 +326,9 @@ def trees_to_scorecard(
329326
"NonEvents",
330327
"Events",
331328
"EventRate",
332-
"LeafValue",
329+
"XAddEvidence",
333330
"WOE",
334331
"IV",
335-
"xAddEvidence",
336332
"DetailedSplit",
337333
]
338334
]

0 commit comments

Comments
 (0)