Skip to content

Problems with random forest classifier when using more and deeper leaners #80

@GradOpt

Description

@GradOpt

Hi, I'm new to thunderGBM,

I just run the example of random forest
'''
from thundergbm import TGBMClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score

x, y = load_digits(return_X_y=True)
clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100)
clf.fit(x, y)
y_pred = clf.predict(x)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
'''
and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions