Hi.
I used the following command in order get the related results in the paper:
Run Experiments.
python launch.py -alg ppo -curiosity_alg rnd -env jamesbond -lstm -sample_mode gpu -num_gpus 1 -normalize_advantage -normalize_reward -dual_value -normalize_obs -fragmentation -recall -use_feature -use_wandb
However, the result is much more like the setting which doesn't use the FARCuriostiy algorithm.
I had an idea that what will happen if we remove the
-normalize_reward
argument, and it worked as it was expected.
Could you give me a hint on what is the problem with the default run command? Is it preferred to use it or this command is just a hint?