Hi,
I'm currently working with a larger dataset which has numerous categorical features, some of them with many categories. I have set up the pipeline with the corresponding decorators for each feature, and using xgb.
I noticed that the CategoricalDomain decorator spends a lot of time in the transform step. I did a bit more digging in the code, and found out that most of the time spend in _compute_masks, specifically in computing the valid mask (_valid_value_mask). I'm using the CategoricalDomain decorator with invalid_value_treatment='as_is' in which case the valid/invalid masks are not really needed as there is no transformation happening.
Would it be possible to skip the step of calculating the valid/invalid mask in case invalid_value_treatment is set to 'as_is'?
Hi,
I'm currently working with a larger dataset which has numerous categorical features, some of them with many categories. I have set up the pipeline with the corresponding decorators for each feature, and using xgb.
I noticed that the CategoricalDomain decorator spends a lot of time in the transform step. I did a bit more digging in the code, and found out that most of the time spend in
_compute_masks, specifically in computing the valid mask (_valid_value_mask). I'm using the CategoricalDomain decorator with invalid_value_treatment='as_is' in which case the valid/invalid masks are not really needed as there is no transformation happening.Would it be possible to skip the step of calculating the valid/invalid mask in case invalid_value_treatment is set to 'as_is'?