PMML evaluator library for PySpark.
This package is a thin Python wrapper around the JPMML-Evaluator-Spark library.
- PySpark 3.0.X through 3.5.X, 4.0.X or 4.1.X.
- Python 3.8 or newer.
Installing a release version from PyPI:
pip install jpmml_evaluator_pysparkAlternatively, installing the latest snapshot version from GitHub:
pip install --upgrade git+https://github.com/jpmml/jpmml-evaluator-pyspark.gitJPMML-Evaluator-PySpark must be paired with JPMML-Evaluator-Spark based on the following compatibility matrix:
| PySpark version | JPMML-Evaluator-Spark branch | Latest JPMML-Evaluator-Spark version |
|---|---|---|
| 4.0.X and 4.1.X | master |
2.1.2 |
| 3.0.X through 3.5.X | 2.0.X |
2.0.2 |
JPMML-Evaluator-PySpark version 0.3.0 and newer bundle JPMML-Evaluator-Spark JAR files for quick programmatic setup.
Use the jpmml_evaluator_pyspark.spark_jars() utility function to obtain a PySpark-version dependent classpath string, and pass it as spark.jars configuration entry when building a Spark session:
from pyspark.sql import SparkSession
import jpmml_evaluator_pyspark
spark = SparkSession.builder \
.config("spark.jars", jpmml_evaluator_pyspark.spark_jars()) \
.getOrCreate()Use the jpmml_evaluator_pyspark.spark_jars_packages() utility function to obtain a PySpark-version dependent Apache Maven package coordinates string:
import jpmml_evaluator_pyspark
print(jpmml_evaluator_pyspark.spark_jars_packages())Pass this value to pyspark or spark-submit using the --packages command-line option:
$SPARK_HOME/bin/pyspark --packages $(python -c "import jpmml_evaluator_pyspark; print(jpmml_evaluator_pyspark.spark_jars_packages())")The "heart" of the PMML transformer is an org.jpmml.evaluator.Evaluator object.
Grab a JVM handle, and build the evaluator object from a streamable PMML resource using the org.jpmml.evaluator.LoadingModelEvaluatorBuilder builder class as usual.
Building from a PMML file:
from jpmml_evaluator_pyspark import _jvm
jvm = _jvm()
pmmlIs = jvm.java.io.FileInputStream("/path/to/DecisionTreeIris.pmml")
try:
evaluator = jvm.org.jpmml.evaluator.LoadingModelEvaluatorBuilder() \
.load(pmmlIs) \
.build()
finally:
pmmlIs.close()
evaluator.verify()JPMML-Evaluator-PySpark faithfully mirrors the JPMML-Evaluator-Spark public API.
The only notable change is that the org.jpmml.evaluator.spark Java/Scala package name has been truncated to the jpmml_evaluator_pyspark Python module name.
Constructing a PMML transformer:
from jpmml_evaluator_pyspark import FlatPMMLTransformer, NestedPMMLTransformer
pmmlTransformer = FlatPMMLTransformer(evaluator)
#pmmlTransformer = NestedPMMLTransformer(evaluator)Transforming data using a PMML transformer:
df = spark.read.csv("/path/to/Iris.csv", header = True, inferSchema = True)
pmmlDf = pmmlTransformer.transform(df)
pmmlDf.show()JPMML-Evaluator-PySpark is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0. For a quick summary of your rights ("Can") and obligations ("Cannot" and "Must") under AGPLv3, please refer to TLDRLegal.
If you would like to use JPMML-Evaluator-PySpark in a proprietary software project, then it is possible to enter into a licensing agreement which makes it available under the terms and conditions of the BSD 3-Clause License instead.
JPMML-Evaluator-PySpark is developed and maintained by Openscoring Ltd, Estonia.
Interested in using Java PMML API software in your company? Please contact info@openscoring.io