Skip to content

Conversation

@mail4umar
Copy link
Collaborator

A customer pointed an issue in #1351

When you load a RFClassifier model, it cannot predict the probabilities.

The issue was that it was not able to get the classes.

I created a specific unit test to capture this. When I ran the test for all ML models, I found a similar issue with XGB. I have implemented a fix for XGBClassifier as well.

Fixed #1357

@mail4umar mail4umar self-assigned this Sep 25, 2025
@mail4umar mail4umar added Bug Something isn't working. Machine Learning - Classification Classification Metrics, Classification Models (RF Classifier, XGBOOST Classifier, Logit...) labels Sep 25, 2025
try:
first_tree = self._compute_trees_arrays(self.get_tree(0), self.X, True)
unique_values = set()
for j in range(len(first_tree[4])): # first_tree[4] is the value array
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a getter specifically for these classes rather than indexing into the tree to get them. If for some reason the index of the array of classes ever changed it would be easier to just change one getter function. Also just a bit suspicious that we can count on the array being at index 4 without checking anything here. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple uses of this funciton _compute_trees_arrays inside ensemble.py. Each class of algorithm is treating it differently. And all of them are relying on this array structure. That is why a getter function would not help too much.

For now I have created a getter function to find the classes. But that still leaves many instances of where we are using the indices because we are relying on this function. it is actually a helper function in itself.

In a latter PR, we may want to totally overhaul the code. but this PR is only for the bug resolution.

@mail4umar mail4umar merged commit c74d893 into vertica:master Oct 10, 2025
1 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working. Machine Learning - Classification Classification Metrics, Classification Models (RF Classifier, XGBOOST Classifier, Logit...)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

predict_proba does not return probabilities after load_model in VerticaPy >= 1.0.5

2 participants