We evaluated sayuri with the TensorRT backend, and found a speedup of over 1.5x when using fp16.
Since it uses C++17 features, a C++ version of 17 or higher is required.
https://github.com/MAOmao000/Sayuri-TensorRT
I hope this will be helpful for future development.