Skip to content

Conversation

@gAldeia
Copy link
Collaborator

@gAldeia gAldeia commented Oct 9, 2025

Final Model Selection, Metrics Fixes, and Class Weights

This PR introduces improvements related to the selection of the final individual, metric consistency, and class weights.

By default, Brush selects the individual with the best score on the inner validation partition. However, this may not always be the user's choice.
With this update, users can specify different selection strategies, and the final model will be updated on the Python side according to the chosen criterion.

In the current state, users can select:

  • The individual with the smallest complexity.
  • The least complex individual whose validation performance is within the top performer’s range.

Additionally, users can now provide custom functions for model selection.

other improvements

To support these features, class weights were implemented in Brush. Models can now use:

  • Unbalanced weights
  • Class weights balanced by support
  • Custom class weights provided by the user as a list.

It is important to notice that these class weights will also be used in the final model selection.

Bug fixes

Several bug fixes were also made. Metrics computed in Brush now match exactly those from scikit-learn with all different options for class weights. This is important as model selection logic is being performed on the Python side to allow for custom selection functions.

With all these additions, various issues were identified and resolved. The code should now be more stable and fully functional.

Several improvements. This is a work in progress.
- class_weights can be user defined, unbalanced, or balanced by support.
it is easier now to define these values.
- archive is not serialized anymore, making it easier to predict from
archive. with this change, I removed hyperparameters related to enabling
archive, as it is easier to just save the entire population and archive
at the end of the run.
- removed predict_archive and predict_proba_archive, just access the
indivudlas directly. I expect that interface will be simpler now.
- bunch of new test cases. there is one that is still failing, I am
working on it.
- final model selection will also match the class weights specified,
so if the user asks for the smallest on the confidence interval, then
it should still work.
Lots of tests for recent functionalities, including
heuristics to select final models in python side.

Some bugs fixed.
@lacava lacava merged commit b285298 into master Oct 9, 2025
4 checks passed
@gAldeia gAldeia deleted the final_model_selection branch October 16, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants