CategoricalDomain performance in transform step

Hi,

I'm currently working with a larger dataset which has numerous categorical features, some of them with many categories. I have set up the pipeline with the corresponding decorators for each feature, and using xgb.

I noticed that the CategoricalDomain decorator spends a lot of time in the transform step. I did a bit more digging in the code, and found out that most of the time spend in `_compute_masks`, specifically in computing the valid mask (`_valid_value_mask`). I'm using the CategoricalDomain decorator with invalid_value_treatment='as_is' in which case the valid/invalid masks are not really needed as there is no transformation happening.

Would it be possible to skip the step of calculating the valid/invalid mask in case invalid_value_treatment is set to 'as_is'?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CategoricalDomain performance in transform step #431

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CategoricalDomain performance in transform step #431

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions