[TUTORIAL] How to save ELG model to .onnx and further to TensorRT .engine

Hi all. I have spent quite some time reading and using this awesome code. Converting the model to .onnx and .engine wasn't too easy so I share how I did it.

## Installation

- Create a [virtual env](https://docs.python.org/3/library/venv.html) using `python 3.7.12`. Don't use python 3.8!
- Install [Python 3.7.12 tarball](https://www.python.org/downloads/release/python-3712/) [according to this guide](https://opensource.com/article/20/4/install-python-linux
- `cd GazeML`
- Then, create virtual env to a folder `.venv` and activate it
- `python3.7 -m venv .venv`
- `source .venv/bin/activate`
- You can deactivate it any time you want by running `deactivate`

Ok, time to install everything!
```
pip install --upgrade pip
pip install cython
pip install scipy
python3 setup.py install
pip install tensorflow==1.14
```

For me `tensorflow==1.15` didn't work. You can also install `tensorflow-gpu`. Make sure it's the same version or check the support matrix on Tensorflow page. Note that most of tf1.x stuff is deprecated so it's hard to get support for that. I'm thinking of implementing this whole repo in pytorch or tf2 for that reason.

If `python3 setup.py install` hangs just install the dependencies by hand one by one.

Get the pre-trained weights: `bash get_trained_weights.bash`

---

## Running the model

Before converting anything test the model: 
```
cd src
python3 elg_demo.py
```

I got a ton of errors but the model worked nonetheless.

## Saving the model as .onnx

Use this tool: [tf2onnx](https://github.com/onnx/tensorflow-onnx)

```
pip install -U tf2onnx
```

Then, we have to modify the code a bit before we can get started.

### Save the `saved-model` in [inference_generator()](https://github.com/swook/GazeML/blob/master/src/core/model.py#L367)

Add these code lines before line 385 `yield outputs`:

```python
            # Save saved-model
            tf.saved_model.simple_save(self._tensorflow_session, "tmp",
                 inputs=data_source.output_tensors, outputs=fetches)
```

When you run this code again (`python3 elg_demo.py`) it will create a folder `tmp` with the `saved_model.pb` in it. But don't run it yet because if you try to convert the code you will get this error:

```
ValueError: Input 0 of node hourglass/pre/BatchNorm/cond_1/AssignMovingAvg/Switch was passed float from hourglass/pre/BatchNorm/moving_mean:0 incompatible with expected float_ref.
```

The error is actually quite good: it tells where in the graph we got problems. `BatchNorm` is creating the problem. There are quite many answers on Google about this issue but I think the easiest way to fix it is to set `training` to `False` as `BatchNorm` behaves differently when training / when testing. Change at least these lines:
- https://github.com/swook/GazeML/blob/master/src/models/elg.py#L181 -> `is_training=False`
- https://github.com/swook/GazeML/blob/master/src/core/model.py#L327 -> `self.use_batch_statistics: False,`
- https://github.com/swook/GazeML/blob/master/src/core/model.py#L381 -> `self.use_batch_statistics: False,`

and optionally:
- https://github.com/swook/GazeML/blob/master/src/models/dpg.py#L224 -> `is_training=False`
if you're using `dpg`. If you don't know, just change it.

This is a bug in the code:  `self.use_batch_statistics` is set to `True` everywhere but it isn't set to `False` at any point. I could create a PR for this.

Now we have done all the changes.

You can convert that file to `.onnx. like so:

```
python3 -m tf2onnx.convert --saved-model ./tmp --output gazeml.onnx
```

For most of your needs that should be enough. You can add `--opset <opset>` for example `--opset 10` if you want to target a specific opset. You can also add `--target tensorrt` or similar. Check the tf2onnx repo for more flags if you need them.

There's one more thing you should know.

## Converting the model to TensorRT .engine

If you try to convert the model using tools like [trtexec](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec) or similar you'll end up with a small problem. The model contains `uint8` but it's not supported by TensorRT. You must remove the `uint8`'s in the model like this:

[here](https://github.com/swook/GazeML/blob/master/src/datasources/frames.py#L116) change `uint8` to `int64` and it will work.

Then you can convert:

```
trtexec --onnx gazeml.onnx --saveEngine gazeml.engine --buildOnly --verbose --best
```
or using [onnx2trt](https://github.com/onnx/onnx-tensorrt):

```
onnx2trt gazeml.onnx -o gazeml.engine
```

That should be it. Thank you! I hope my weeks of grinding helps someone. Please ask me if there are any questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TUTORIAL] How to save ELG model to .onnx and further to TensorRT .engine #90

Installation

Running the model

Saving the model as .onnx

Save the `saved-model` in inference_generator()

Converting the model to TensorRT .engine

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[TUTORIAL] How to save ELG model to .onnx and further to TensorRT .engine #90

Description

Installation

Running the model

Saving the model as .onnx

Save the saved-model in inference_generator()

Converting the model to TensorRT .engine

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Save the `saved-model` in inference_generator()