Predict() Method Always Returns 1 (Binary Classification)

When you attempt to use dxgb.XGBClassifier's predict method, it always generates a prediction of 1 regardless of the `predict_proba` (sigmoid) output. See minimal motivating example below, where I generate targets of all 0. The model learns it should generally predict 0 (low probabilities), but the predictions all generate 1. 

Note: you cannot pass a threshold parameter into `.predict()`, another notable gap.

```
import dask_xgboost as dxgb
from dask.distributed import Client
import dask.array as da
import numpy as np

client = Client()

X = np.random.randint(1,5,(10,2))
y = np.zeros(10)

X = da.from_array(X)
y = da.from_array(y)

model = dxgb.XGBClassifier(n_estimator=5)
model.fit(X, y)

sigmoids = model.predict_proba(X).compute()
preds = model.predict(X).compute()

print(sigmoids, preds)
```

Output: 
(First list is sigmoids, second list is predictions)

```
[0.10914253 0.10914253 0.10914253 0.10914253 0.10914253 0.10914253
 0.10914253 0.10914253 0.10914253 0.10914253] [1 1 1 1 1 1 1 1 1 1]
```

It stems from line 537 of `core.py` 

```
            cidx = (class_probs > 0).astype(np.int64)
```

Where any generated single dimensional class probability is evaluated as a 1. It's an easy fix, all you have to do is pass in a `threshold` parameter that allows you to set that `0` to some float and default that value to 0.5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Predict() Method Always Returns 1 (Binary Classification) #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Predict() Method Always Returns 1 (Binary Classification) #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions