The code for the adversarial training is not working and the examples should be updated. Moreover, the current code seems to be defined only for off policy algorithms. Did the examples ever work ? Is the current code a good starting point or should it be rewritten completely ?