Skip to content

failed in rename column name and output pmml #458

@duanzongran

Description

@duanzongran

I want to do a category declaration, missing value impute, target encoding of the character variables in the data, and then rename them with 'cat_' as the prefix(This is something I must do), which is successful for data preprocessing(I mean I got the expected column names) and modeling, but I can't export it to a pmml file properly.

Here is my code:

import pandas as pd
from sklearn_pandas import DataFrameMapper

from sklearn.impute import SimpleImputer
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from category_encoders import TargetEncoder

from sklearn2pmml import sklearn2pmml, PMMLPipeline
from sklearn2pmml.decoration import Alias, CategoricalDomain



data = pd.DataFrame({
    'cat1': ['A', 'B', None, 'C','C','B'],
    'cat2': ['X', 'Y', 'X', 'Z','X','Y'],
    'target': [0, 1, 0, 1,1,0]
})

X = data[['cat1', 'cat2']]
y = data['target']


cat_origin_list = ['cat1', 'cat2']

cat_mapper = [
    (
        [col],
            Pipeline([
                ("domain", CategoricalDomain(with_data=False)),
                ("imputer", SimpleImputer(missing_values=None,strategy="constant", fill_value="-999")),
                ("target_encode", TargetEncoder())
            ]),{"alias":f"cat_{col}"}
    )
    for col in cat_origin_list
]


mapper = DataFrameMapper(
    cat_mapper,
    df_out=True
)


pipeline = PMMLPipeline([
    ("mapper", mapper),
    ("classifier", DecisionTreeClassifier())
])


pipeline.fit(X, y)
X_transformed=pipeline.named_steps['mapper'].transform(X)

sklearn2pmml(pipeline, "model-test.pmml", pmml_schema="4.3")

It raised the error :

Standard output is empty
Standard error:
Exception in thread "main" org.jpmml.sklearn.SkLearnException: User input field cat1 cannot be renamed
	at org.jpmml.sklearn.SkLearnEncoder.renameFeature(SkLearnEncoder.java:284)
	at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:79)
	at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:48)
	at sklearn.Initializer.encode(Initializer.java:59)
	at sklearn.Composite.encodeFeatures(Composite.java:112)
	at sklearn.Composite.initFeatures(Composite.java:255)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:113)
	at com.sklearn2pmml.Main.run(Main.java:100)
	at com.sklearn2pmml.Main.main(Main.java:85)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions