This project implements the Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms to tackle continuous actions on some Gymnasium environments.
Here are some of the results.
Bipedal Walker (TD3):
Humanoid (SAC):
├── env.yaml
├── SAC
│ ├── agent.py
│ └── configs
├── scripts
│ ├── environments_overview.ipynb
│ ├── main.py
│ └── results.ipynb
├── src
│ ├── environment.py
│ ├── networks.py
│ └── utils.py
└── TD3
├── agent.py
└── configs- The
env.yamlfile allows to create a new conda environment with the same packages utilized in this project, paramount to execute this code. To create the environment execute the command:
conda env create -f env.yaml
- The SAC and TD3 directories contain the implementation of the algorithms (
agent.py) and the parameters and hyperparameters utilized are setupped in theconfigsubdirectory for each environment. - The
main.pycontains the code to run theSACandTD3algorithms.
The code can be executed from the parent directory or the scripts directory. To execute from the parent directory the following command can be used:
python scripts/main.py -env <ENVIRONMENT_NAME> -alg <ALGORITHM>The options for <ENVIRONMENT_NAME> are:
The options for <ALGORITHM>, of course, are:
- sac
- td3
In the scripts subdirectory the execution is as follows:
python main.py -env <ENVIRONMENT_NAME> -alg <ALGORITHM>Three subdirectories will be created after executing this command:
checkpointsto save the models and training datalogsto keep track of the hyperparameters utilizedresultsto store the final results
Inside these directories, specific subdirectories are created for each environment.
https://docs.google.com/presentation/d/16EGlFeVgT5UstF_6QOyH6OSu9X48F7HwPjW9v2mAvzM/edit?usp=sharing

