-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Status
The current code for raisimpy_gym is slower than the wrapped C++ code for raisimGym.
There are several ways to improve the performances.
The problem with threads
Currently, the Python code uses threads however due to the Global Interpreter Lock (GIL), the threads are prevented to run truly in parallel. Worse than that, there are some overheads due to acquiring and releasing the lock, thus it is actually better to not use threads.
There are 2 solutions (that I can currently think of) that could improve the performances:
- using pybind11: use threads in C++ that call an arbitrary Python function. As the documentation is scarce (GIL in pybind11), I will need to test this, and this might actually not work.
- using multiple processes instead of threads: each process has its own GIL so by using multiple processes you don't have the problems with threads. There are several multiprocessing libraries that could be useful:
multiprocessing,pathos.multiprocessing,joblib,concurrent.futures, andray. For the moment, I quickly tested and could make the current code works withconcurrent.futures.ProcessPoolExecutor, which improved a bit the performances compared to the single thread/process code.
Copy vs Reference
Currently, I am passing by value the numpy arrays and Eigen matrices/vectors instead of passing them by reference, and so there is a copy everytime we convert between these 2 datatypes, as described here: https://pybind11.readthedocs.io/en/stable/advanced/cast/eigen.html?highlight=eigen
The solution is to pass them by reference instead, and make sure for that for matrices we pay attention to the fact that Eigen uses a column-major order, while numpy uses a row-major order.