I want to do a category declaration, missing value impute, target encoding of the character variables in the data, and then rename them with 'cat_' as the prefix(This is something I must do), which is successful for data preprocessing(I mean I got the expected column names) and modeling, but I can't export it to a pmml file properly.
Here is my code:
import pandas as pd
from sklearn_pandas import DataFrameMapper
from sklearn.impute import SimpleImputer
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from category_encoders import TargetEncoder
from sklearn2pmml import sklearn2pmml, PMMLPipeline
from sklearn2pmml.decoration import Alias, CategoricalDomain
data = pd.DataFrame({
'cat1': ['A', 'B', None, 'C','C','B'],
'cat2': ['X', 'Y', 'X', 'Z','X','Y'],
'target': [0, 1, 0, 1,1,0]
})
X = data[['cat1', 'cat2']]
y = data['target']
cat_origin_list = ['cat1', 'cat2']
cat_mapper = [
(
[col],
Pipeline([
("domain", CategoricalDomain(with_data=False)),
("imputer", SimpleImputer(missing_values=None,strategy="constant", fill_value="-999")),
("target_encode", TargetEncoder())
]),{"alias":f"cat_{col}"}
)
for col in cat_origin_list
]
mapper = DataFrameMapper(
cat_mapper,
df_out=True
)
pipeline = PMMLPipeline([
("mapper", mapper),
("classifier", DecisionTreeClassifier())
])
pipeline.fit(X, y)
X_transformed=pipeline.named_steps['mapper'].transform(X)
sklearn2pmml(pipeline, "model-test.pmml", pmml_schema="4.3")
It raised the error :
Standard output is empty
Standard error:
Exception in thread "main" org.jpmml.sklearn.SkLearnException: User input field cat1 cannot be renamed
at org.jpmml.sklearn.SkLearnEncoder.renameFeature(SkLearnEncoder.java:284)
at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:79)
at sklearn_pandas.DataFrameMapper.initializeFeatures(DataFrameMapper.java:48)
at sklearn.Initializer.encode(Initializer.java:59)
at sklearn.Composite.encodeFeatures(Composite.java:112)
at sklearn.Composite.initFeatures(Composite.java:255)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:113)
at com.sklearn2pmml.Main.run(Main.java:100)
at com.sklearn2pmml.Main.main(Main.java:85)
I want to do a category declaration, missing value impute, target encoding of the character variables in the data, and then rename them with 'cat_' as the prefix(This is something I must do), which is successful for data preprocessing(I mean I got the expected column names) and modeling, but I can't export it to a pmml file properly.
Here is my code:
It raised the error :