Final model selection #64

gAldeia · 2025-10-09T21:09:13Z

Final Model Selection, Metrics Fixes, and Class Weights

This PR introduces improvements related to the selection of the final individual, metric consistency, and class weights.

By default, Brush selects the individual with the best score on the inner validation partition. However, this may not always be the user's choice.
With this update, users can specify different selection strategies, and the final model will be updated on the Python side according to the chosen criterion.

In the current state, users can select:

The individual with the smallest complexity.
The least complex individual whose validation performance is within the top performer’s range.

Additionally, users can now provide custom functions for model selection.

other improvements

To support these features, class weights were implemented in Brush. Models can now use:

Unbalanced weights
Class weights balanced by support
Custom class weights provided by the user as a list.

It is important to notice that these class weights will also be used in the final model selection.

Bug fixes

Several bug fixes were also made. Metrics computed in Brush now match exactly those from scikit-learn with all different options for class weights. This is important as model selection logic is being performed on the Python side to allow for custom selection functions.

With all these additions, various issues were identified and resolved. The code should now be more stable and fully functional.

Several improvements. This is a work in progress. - class_weights can be user defined, unbalanced, or balanced by support. it is easier now to define these values. - archive is not serialized anymore, making it easier to predict from archive. with this change, I removed hyperparameters related to enabling archive, as it is easier to just save the entire population and archive at the end of the run. - removed predict_archive and predict_proba_archive, just access the indivudlas directly. I expect that interface will be simpler now. - bunch of new test cases. there is one that is still failing, I am working on it. - final model selection will also match the class weights specified, so if the user asks for the smallest on the confidence interval, then it should still work.

Lots of tests for recent functionalities, including heuristics to select final models in python side. Some bugs fixed.

gAldeia added 6 commits September 22, 2025 17:04

New heuristic to pick final model. fixes in AUPRC and archive.

5eeb324

Fixed accuracy and AUPRC. everything works with class_weights.

afed581

Lots of tests for recent functionalities, including heuristics to select final models in python side. Some bugs fixed.

Fixed broken cpp test

e4d4d7f

Merge remote-tracking branch 'origin/master' into final_model_selection

cc46c8d

Merge and rerun tests

7ecedfa

lacava merged commit b285298 into master Oct 9, 2025
4 checks passed

gAldeia deleted the final_model_selection branch October 16, 2025 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Final model selection #64

Final model selection #64

Uh oh!

gAldeia commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Final model selection #64

Final model selection #64

Uh oh!

Conversation

gAldeia commented Oct 9, 2025

Final Model Selection, Metrics Fixes, and Class Weights

other improvements

Bug fixes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants