-
Notifications
You must be signed in to change notification settings - Fork 6
Implement an LGBM->ONNX model conversion + inferencing #147
Description
The goal of this task is to add another variant to the inferencing benchmark for LightGBM. We already are comparing lightgbm python, lightgbm C, treelite. We'd like to try onnxruntime as it seems to be applicable.
In particular, we'd like to reproduce the results in this post on hummindbird and onnxruntime for classical ML models.
Feel free to reach out to the posters of the blog for collaboration.
The expected impact of this task:
- increase the value of the benchmark for the lightgbm community, in particular for production scenarios
- identify better production inferencing technologies
Learning Goals
By working on this project you'll be able to learn:
- how to use onnxruntime for classical ML models
- how to compare inferencing technologies in a benchmark
- how to write components and pipelines for AzureML (component sdk + shrike)
Expected Deliverable:
To complete this task, you need to deliver:
- 2 working python script: one to convert lightgbm models into onnx (using hummingbird?), one to use onnxruntime for inferencing
- their corresponding working AzureML component
- a successful run of the lightgbm inferencing benchmark pipeline
Instructions
Prepare for coding
- Follow the installation process, please report any issue you meet, that will help!
- Clone this repo, create your own branch
username/onnxruntime(or something) for your own work (commit often!). - In
src/scripts/model_transformationcreate a folderlightgbm_to_onnx/and copy the content ofsrc/scripts/samples/in it.
Local development
Let's start locally first.
To iterate on your python script, you need to consider a couple of constraints:
- Follow the instructions in the sample script to modify and make your own.
- Please consider using inputs and outputs that are provided as directories, not single files. There's a helper function to let you automatically select the unique file contained in a directory (see
src/common/io.pyfunctioninput_file_path)
Here's a couple of links to get you started:
Feel free to check out the current treelite modules (model_conversion/treelite_compile and inferencing/treelite_python). They have a similar behavior. You can also implement some unit tests from tests/scripts/test_treelite_python.py.
Develop for AzureML
Component specification
-
First, unit tests. Edit
tests/aml/test_components.pyand watch for the list of components. Add the relative path to your component spec in this list.
You can test your component by runningpytest tests/aml/test_components.py -v -k name_of_component
-
Edit the file
spec.yamlin the directory of your component (copied from sample) and align its arguments with the expected arguments of your component until you pass the unit tests.
Integration in the inferencing pipeline
WORK IN PROGRESS