This is a simplified version of the Perpetual Humanoid Control repository. The class inheritance and dependencies have been greatly reduced, and PufferLib is used to train the policy. Thank you to Zhengyi Luo for the awesome research and open sourcing the code.
This project was sponsored by Puffer AI. If you like what you see, please contact Puffer AI for priority service.
This repo implements only the single primitive model, which can perform 99.0% of the AMASS 11313 motions. See the wandb log for training details. The AMASS checkpoints are shared in the onedrive folder.
-
Clone this repository.
git clone https://github.com/kywch/puffer-phc.gitThen, go to the cloned directory.
cd puffer-phc -
Using pixi, setup the virutal environment and install the dependencies. Install pixi, if you haven't already. See pixi documentation for more details. The following command is for linux.
curl -fsSL https://pixi.sh/install.sh | bashThe following command installs the dependencies and activate the virtual environment:
pixi shell -
Install Isaac Gym and gymtorch. Download and unzip Isaac Gym from here. Then. install isaacgym by running the following command inside the virtual environment:
cd <isaac gym directory>/python pip install -e .Also, install gymtorch by running the following command inside the repository directory:
pixi run build_gymtorchThis gymtorch allows you to debug the isaac gym env inside VSCode.
You can test if the installation is successful by running:
pixi run test_deps -
Download the SMPL parameters from SMPL, and unzip them into
smplfolder. Rename the filesbasicmodel_neutral_lbs_10_207_0_v1.1.0,basicmodel_m_lbs_10_207_0_v1.1.0.pkl,basicmodel_f_lbs_10_207_0_v1.1.0.pkltoSMPL_NEUTRAL.pkl,SMPL_MALE.pklandSMPL_FEMALE.pkl. -
Train a policy. In the virtual environment, run:
python scripts/train.py --config config.ini -m <MOTION FILE PATH>The script supports wandb logging. To use wandb, log in to wandb, then add
--trackto the command.To prepare your own motion data, please see the
convert_amass_data.pyscript in thescriptsfolder. After conversion, you can visually inspect the data with thevis_motion_mj.pyscript. -
Play the trained policy. In the virtual environment, run:
python scripts/train.py --mode play -c <CHECKPOINT PATH>For batch evaluation (e.g., using 4096 envs to evaluate AMASS 11313 motions), run:
python scripts/train.py --mode eval -c <CHECKPOINT PATH> -
Sweep the hyperparameters using CARBS.
python scripts/train.py --config config.ini --mode sweepTo adjust the sweep parameters and range, edit the
config.inifile. You might need to comment/uncomment some parts in thetrain.pyscript,sweep_carbs()function to make the sweep work.
python scripts/train.py --helpshows the full list of options for the environment and training, which allows you to override the defaults in the config file.- I tested the style discriminator in the original PHC repo and here, saw that it did not improve the imitation performance, so turned it off in the config. To enable it, set
use_amp_obsto True in the config, or add--use-amp-obs Trueto the command. - I saw several times that "fine-tuning" pretrained weights with
-c <CHECKPOINT PATH>resulted in much faster learning. The L2 init reg loss is being logged in the wandb dashboard, and you can see greater L2 distance when learning from scratch compared to fine-tuning. - As mentioned in the PHC paper, I saw that the 6-layer MLP perform better than shallow MLPs, and SiLU activations perform better than ReLUs.
- Adding LayerNorm before the last layer tamed the gradient norm, and I swept the hyperparameters with the max grad norm of 10 (the original PHC repo uses 50).
- Observations are RMS normalized, but the rewards/values are not. The gamma was set to 0.98 to manage the value loss. The hyperparameter sweeps consistently converged to small lambda values, so I chose 0.2.
- Also, the sweeps consistently converged to very aggressive clip coefs (0.01) and higher learning rates. I speculate that since the trainer uses the same learning rate for both actor and critic, it's using actor clipping to slow down the actor learning relative to the critic.
- I tried using LSTM for both the actor and critic, respectively, and it didn't work better. These LSTM policies are included in the code, so feel free to try them.
This repository is built on top of the following amazing repositories:
- Main PHC code, motion lib, poselib, and data scripts are from: PHC and Isaac Gym
- The PPO and CARBS sweep code is from: PufferLib
- Sample motion data is from: CMU Motion Capture Dataset, Subject 5, Motion 6
Please follow the license of the above repositories for usage.
