Hello Villu,
it's been a while and I hope you're fine. I've come back more questions.
Let's start with some code:
# create some data
X = pd.DataFrame(
{
"numbers": [1, 2, 3, 40, 5],
"colors": ["yellow ", "blue", "BLACK", "green", "red"],
}
)
# create a simple mapper
mapper = DataFrameMapper(
[
(
["colors"],
[
# CategoricalDomain(dtype=str),
ExpressionTransformer("X[0].lower()"),
MatchesTransformer("green"),
],
{"alias": "color_green"},
)
],
df_out=True,
default=False,
)
The following pipeline doesn't make much sense from a machine learning poit of view, but it shows the issue very well:
pmml_pipe = PMMLPipeline(
[
("mapper", mapper)
]
)
# fit and transform
pmml_pipe.fit_transform(X)
# export as PMML
sklearn2pmml(pmml_pipe, "output.pmml", with_repr=True)
In Python, everything works as expected. Now the issue is within the generated output.pmml file, where you can find the following:
<DataDictionary>
<DataField name="colors" optype="continuous" dataType="double"/>
</DataDictionary>
Knowing that the input has an infinte amount of possible values, how can I set this data type to "string"?
Hello Villu,
it's been a while and I hope you're fine. I've come back more questions.
Let's start with some code:
The following pipeline doesn't make much sense from a machine learning poit of view, but it shows the issue very well:
In Python, everything works as expected. Now the issue is within the generated
output.pmmlfile, where you can find the following:Knowing that the input has an infinte amount of possible values, how can I set this data type to "string"?