Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Predict() Method Always Returns 1 (Binary Classification) #62

@kylejn27

Description

@kylejn27

When you attempt to use dxgb.XGBClassifier's predict method, it always generates a prediction of 1 regardless of the predict_proba (sigmoid) output. See minimal motivating example below, where I generate targets of all 0. The model learns it should generally predict 0 (low probabilities), but the predictions all generate 1.

Note: you cannot pass a threshold parameter into .predict(), another notable gap.

import dask_xgboost as dxgb
from dask.distributed import Client
import dask.array as da
import numpy as np

client = Client()

X = np.random.randint(1,5,(10,2))
y = np.zeros(10)

X = da.from_array(X)
y = da.from_array(y)

model = dxgb.XGBClassifier(n_estimator=5)
model.fit(X, y)

sigmoids = model.predict_proba(X).compute()
preds = model.predict(X).compute()

print(sigmoids, preds)

Output:
(First list is sigmoids, second list is predictions)

[0.10914253 0.10914253 0.10914253 0.10914253 0.10914253 0.10914253
 0.10914253 0.10914253 0.10914253 0.10914253] [1 1 1 1 1 1 1 1 1 1]

It stems from line 537 of core.py

            cidx = (class_probs > 0).astype(np.int64)

Where any generated single dimensional class probability is evaluated as a 1. It's an easy fix, all you have to do is pass in a threshold parameter that allows you to set that 0 to some float and default that value to 0.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions