When I train a sklearn pipeline containing a TargetEncoder and convert it using sklearn2pmml to a PMML file, I get an error when a categorical value that was not seen during training is present in new data. The desired behavior is that the default value is returned. When I would create the pipeline using the PMMLPipeline object and define the categorical variable using CategoricalDomain with invalid_value_treatment = "as_is", it works well on unseen categorical data.
Is there a way to avoid this problem when I want to convert an existing trained sklearn pipeline?
When I train a sklearn pipeline containing a TargetEncoder and convert it using
sklearn2pmmlto a PMML file, I get an error when a categorical value that was not seen during training is present in new data. The desired behavior is that the default value is returned. When I would create the pipeline using thePMMLPipelineobject and define the categorical variable usingCategoricalDomainwithinvalid_value_treatment = "as_is", it works well on unseen categorical data.Is there a way to avoid this problem when I want to convert an existing trained sklearn pipeline?