-
Notifications
You must be signed in to change notification settings - Fork 147
Description
Hi all. I have spent quite some time reading and using this awesome code. Converting the model to .onnx and .engine wasn't too easy so I share how I did it.
Installation
- Create a virtual env using
python 3.7.12. Don't use python 3.8! - Install Python 3.7.12 tarball [according to this guide](https://opensource.com/article/20/4/install-python-linux
cd GazeML- Then, create virtual env to a folder
.venvand activate it python3.7 -m venv .venvsource .venv/bin/activate- You can deactivate it any time you want by running
deactivate
Ok, time to install everything!
pip install --upgrade pip
pip install cython
pip install scipy
python3 setup.py install
pip install tensorflow==1.14
For me tensorflow==1.15 didn't work. You can also install tensorflow-gpu. Make sure it's the same version or check the support matrix on Tensorflow page. Note that most of tf1.x stuff is deprecated so it's hard to get support for that. I'm thinking of implementing this whole repo in pytorch or tf2 for that reason.
If python3 setup.py install hangs just install the dependencies by hand one by one.
Get the pre-trained weights: bash get_trained_weights.bash
Running the model
Before converting anything test the model:
cd src
python3 elg_demo.py
I got a ton of errors but the model worked nonetheless.
Saving the model as .onnx
Use this tool: tf2onnx
pip install -U tf2onnx
Then, we have to modify the code a bit before we can get started.
Save the saved-model in inference_generator()
Add these code lines before line 385 yield outputs:
# Save saved-model
tf.saved_model.simple_save(self._tensorflow_session, "tmp",
inputs=data_source.output_tensors, outputs=fetches)When you run this code again (python3 elg_demo.py) it will create a folder tmp with the saved_model.pb in it. But don't run it yet because if you try to convert the code you will get this error:
ValueError: Input 0 of node hourglass/pre/BatchNorm/cond_1/AssignMovingAvg/Switch was passed float from hourglass/pre/BatchNorm/moving_mean:0 incompatible with expected float_ref.
The error is actually quite good: it tells where in the graph we got problems. BatchNorm is creating the problem. There are quite many answers on Google about this issue but I think the easiest way to fix it is to set training to False as BatchNorm behaves differently when training / when testing. Change at least these lines:
- https://github.com/swook/GazeML/blob/master/src/models/elg.py#L181 ->
is_training=False - https://github.com/swook/GazeML/blob/master/src/core/model.py#L327 ->
self.use_batch_statistics: False, - https://github.com/swook/GazeML/blob/master/src/core/model.py#L381 ->
self.use_batch_statistics: False,
and optionally:
- https://github.com/swook/GazeML/blob/master/src/models/dpg.py#L224 ->
is_training=False
if you're usingdpg. If you don't know, just change it.
This is a bug in the code: self.use_batch_statistics is set to True everywhere but it isn't set to False at any point. I could create a PR for this.
Now we have done all the changes.
You can convert that file to `.onnx. like so:
python3 -m tf2onnx.convert --saved-model ./tmp --output gazeml.onnx
For most of your needs that should be enough. You can add --opset <opset> for example --opset 10 if you want to target a specific opset. You can also add --target tensorrt or similar. Check the tf2onnx repo for more flags if you need them.
There's one more thing you should know.
Converting the model to TensorRT .engine
If you try to convert the model using tools like trtexec or similar you'll end up with a small problem. The model contains uint8 but it's not supported by TensorRT. You must remove the uint8's in the model like this:
here change uint8 to int64 and it will work.
Then you can convert:
trtexec --onnx gazeml.onnx --saveEngine gazeml.engine --buildOnly --verbose --best
or using onnx2trt:
onnx2trt gazeml.onnx -o gazeml.engine
That should be it. Thank you! I hope my weeks of grinding helps someone. Please ask me if there are any questions.