Skip to content

wrong metric reported in sdkv2/ch7/xgb/xgb-dm.py #9

@dipetkov

Description

@dipetkov

Chapter 7 shows how to use the XGBoost framework to train a xgb.XGBClassifier by optimizing AUC.

After training, the script prints out the AUC score on the validation data.

auc = cls.score(x_val, y_val)
print("AUC ", auc)

[Snippet on lines 49-50 in sdkv2/ch7/xgb/xgb-dm.py.]

However, xgb.XGBClassifier.score returns the mean accuracy, not the evaluation metric.

So instead clf.score it's better to use sklearn.metrics.roc_auc_score.

Here is a complete reproducible example:

import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

cancer = load_breast_cancer()

x_trn, x_val, y_trn, y_val = train_test_split(
    cancer.data, cancer.target,
    test_size=0.2,
    random_state=1,
)

model = xgb.XGBClassifier(
    objective="binary:logistic",
    eval_metric="auc",
    max_depth=2,
    random_state=2,
)
model.fit(
    x_trn, y_trn,
    verbose=False,
)

p_val = model.predict_proba(x_val)[:, 1]
roc_auc_score(y_val, p_val)  # 0.9861

# Returns the mean accuracy, not the evaluation metric.
model.score(x_val, y_val)  # 0.9561

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions