Skip to content

Detect invalid value treatment policy based on the "transformer composition" of SkLearn pipeline  #436

@fritshermans

Description

@fritshermans

When I train a sklearn pipeline containing a TargetEncoder and convert it using sklearn2pmml to a PMML file, I get an error when a categorical value that was not seen during training is present in new data. The desired behavior is that the default value is returned. When I would create the pipeline using the PMMLPipeline object and define the categorical variable using CategoricalDomain with invalid_value_treatment = "as_is", it works well on unseen categorical data.

Is there a way to avoid this problem when I want to convert an existing trained sklearn pipeline?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions