Implement an LGBM->ONNX model conversion + inferencing

The goal of this task is to add another variant to the inferencing benchmark for LightGBM. We already are comparing lightgbm python, lightgbm C, treelite. We'd like to try onnxruntime as it seems to be applicable.

In particular, we'd like to reproduce the results in this post on [hummindbird and onnxruntime for classical ML models](https://cloudblogs.microsoft.com/opensource/2020/09/29/accelerate-machine-learning-models-gpu-onnx-runtime-hummingbird/).

Feel free to reach out to the posters of the blog for collaboration.

The expected impact of this task:
- increase the value of the benchmark for the lightgbm community, in particular for production scenarios
- identify better production inferencing technologies

⚠️ It is unknown at this point if hummingbird allows the conversion of lightgbm>=v3 models to onnx. If that was impossible, it's still a good think to know, and to report in the [hummingbird issues](https://github.com/microsoft/hummingbird/issues).

## Learning Goals

By working on this project you'll be able to learn:
- how to use onnxruntime for classical ML models
- how to compare inferencing technologies in a benchmark
- how to write components and pipelines for AzureML (component sdk + shrike)

## Expected Deliverable:

To complete this task, you need to deliver:
- 2 working python script: one to convert lightgbm models into onnx (using hummingbird?), one to use onnxruntime for inferencing
- their corresponding working AzureML component
- a successful run of the lightgbm inferencing benchmark pipeline

## Instructions

### Prepare for coding

1. Follow the [installation process](https://microsoft.github.io/lightgbm-benchmark/run/install/), please [report any issue](https://github.com/microsoft/lightgbm-benchmark/issues) you meet, that will help!
2. Clone this repo, create your own branch `username/onnxruntime` (or something) for your own work (commit often!).
3. In `src/scripts/model_transformation` create a folder `lightgbm_to_onnx/` and copy the content of [`src/scripts/samples/`](https://github.com/microsoft/lightgbm-benchmark/tree/main/src/scripts/sample) in it.

### Local development

Let's start locally first.

To iterate on your python script, you need to consider a couple of constraints:
- Follow the instructions in the sample script to modify and make your own.
- Please consider using inputs and outputs that are provided as directories, not single files. There's a helper function to let you automatically select the unique file contained in a directory (see `src/common/io.py` function `input_file_path`)

Here's a couple of links to get you started:
- https://github.com/microsoft/hummingbird/
- https://onnxruntime.ai/docs/get-started/with-python.html

Feel free to check out the current treelite modules (`model_conversion/treelite_compile` and `inferencing/treelite_python`). They have a similar behavior. You can also implement some unit tests from `tests/scripts/test_treelite_python.py`.

### Develop for AzureML

#### Component specification

1. First, unit tests. Edit `tests/aml/test_components.py` and watch for the list of components. Add the relative path to your component spec in this list.
    You can test your component by running
    ```bash
    pytest tests/aml/test_components.py -v -k name_of_component
    ```

2. Edit the file `spec.yaml` in the directory of your component (copied from sample) and align its arguments with the expected arguments of your component until you pass the unit tests.

#### Integration in the inferencing pipeline

WORK IN PROGRESS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an LGBM->ONNX model conversion + inferencing #147

Learning Goals

Expected Deliverable:

Instructions

Prepare for coding

Local development

Develop for AzureML

Component specification

Integration in the inferencing pipeline

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement an LGBM->ONNX model conversion + inferencing #147

Description

Learning Goals

Expected Deliverable:

Instructions

Prepare for coding

Local development

Develop for AzureML

Component specification

Integration in the inferencing pipeline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions