You can configure the behavior of the script by a variety of command-line flags:
--verbose: Set verbose mode (0: no output, 1: INFO). Choices: [0, 1]. Default is 0.--seed: Set seed for random number generator. Default is -1.--num-exps: Specify number of experiments. Default is 1.--num-threads: Specify number of threads for PyTorch. Default is -1.
--env: Name of OpenAi Gym environment. Default isBipedalWalker-v3.--n-envs: Number of environments for stack. Default is 1.--env-kwargs: Specify additional environment arguments in dictionary.--vec-env-type: Type of vector environment. Choices: [dummy,subproc]. Default isdummy.--adv-env: Specify whether to use adversarial Gym environment. Default isFalse.
--algo: Reinforcement Learning algorithm. Default isppo.--saved-models-path: Path to where models are being saved. Default issaved_models.--pretrained-model: Path to a pretrained agent to continue training.--save-replay-buffer: Whether to save the replay buffer (when applicable). Default isFalse.
--hyperparameter: Overwrite hyperparameter (e.g. learning_rate:0.01 train_freq:10).--optimize-hyperparameters: Whether to run hyperparameter search. Default isFalse.--hyperparameter-path: Path to saved hyperparameters.--storage: Database storage path if distributed optimization should be used. Default isNone.--sampler: Sampler to use when optimizing hyperparameters with Optuna. Choices: [random,tpe,skopt]. Default istpe.--pruner:Pruner to use when optimizing hyperparameters with Optuna. Choices: [halving,median,none]. Default ismedian.--optimization-log-path: Path to save the evaluation log. Default ishyperparam_optimization.--n-opt-trials: Number of trials when optimizing hyperparameters with Optuna. Default is 10.--no-optim-plots: Do not plot results when performing hyperparameter optimization.--n-jobs: Number of parallel jobs when optimizing hyperparameters with Optuna. Default is 1.--n-startup-trials: Number of trials before using optuna sampler. Default is 10.--n-evaluations-opt: Training policy evaluated every n-timesteps during hyperparameter optimization. Default is 20.
--n-timesteps: Number of timesteps for training the RL agent. Default is -1.--device: Specify device. Default iscpu.--save-freq: Save model every k steps (if negative, no checkpoint). Default is -1.--log-interval: Log results every k steps (if -1, no change). Default is -1.
--eval-freq: Evaluate the agent every e steps (if negative, no evaluation). Default is 10000.--n-eval-envs: Number of evaluation environments. Default is 1.--n-eval-episodes: Number of evaluation episodes. Default is 5.
--tensorboard-log: Tensorboard log dir. Default istb_logging.--log-folder: Log folder for e.g. benchmarking. Default islogging.
--protagonist-policy: Policy of protagonist. Default isMlpPolicy.--adversary-policy: Policy of adversary. Default isMlpPolicy.--N-mu: Number of protagonist iterations. Default is -1.--N-nu: Number of adversary iterations. Default is -1.
--adv-impact: Define how adversary impacts agent. Choices: [control,force]. Ifcontrol, the adversary impacts the final action command by adding its action onto the protagonist's action command. Ifforce, the adversary applies force on agent (only possible for MuJoCo envs). Default iscontrol.--adv-fraction: Scaling factor for adversarial action. Default is 1.0.--adv-delay: Postpone optimization of adversary. Default is -1.--adv-index-list: Contact point for adversarial forces (for Mujoco environments). Default istorso.--adv-force-dim: Dimension of adversarial force vector per component. Default is 2.