Puffer PHC

This is a simplified version of the Perpetual Humanoid Control repository. The class inheritance and dependencies have been greatly reduced, and PufferLib is used to train the policy. Thank you to Zhengyi Luo for the awesome research and open sourcing the code.

This project was sponsored by Puffer AI. If you like what you see, please contact Puffer AI for priority service.

This repo implements only the single primitive model, which can perform 99.0% of the AMASS 11313 motions. See the wandb log for training details. The AMASS checkpoints are shared in the onedrive folder.

Getting Started

Clone this repository.

git clone https://github.com/kywch/puffer-phc.git

Then, go to the cloned directory.

cd puffer-phc

Using pixi, setup the virutal environment and install the dependencies. Install pixi, if you haven't already. See pixi documentation for more details. The following command is for linux.
```
curl -fsSL https://pixi.sh/install.sh | bash
```
The following command installs the dependencies and activate the virtual environment:
```
pixi shell
```
Install Isaac Gym and gymtorch. Download and unzip Isaac Gym from here. Then. install isaacgym by running the following command inside the virtual environment:
```
cd <isaac gym directory>/python
pip install -e .
```
Also, install gymtorch by running the following command inside the repository directory:
```
pixi run build_gymtorch
```
This gymtorch allows you to debug the isaac gym env inside VSCode.

You can test if the installation is successful by running:
```
pixi run test_deps
```
Download the SMPL parameters from SMPL, and unzip them into smpl folder. Rename the files basicmodel_neutral_lbs_10_207_0_v1.1.0, basicmodel_m_lbs_10_207_0_v1.1.0.pkl, basicmodel_f_lbs_10_207_0_v1.1.0.pkl to SMPL_NEUTRAL.pkl, SMPL_MALE.pkl and SMPL_FEMALE.pkl.
Train a policy. In the virtual environment, run:
```
python scripts/train.py --config config.ini -m <MOTION FILE PATH>
```
The script supports wandb logging. To use wandb, log in to wandb, then add --track to the command.

To prepare your own motion data, please see the convert_amass_data.py script in the scripts folder. After conversion, you can visually inspect the data with the vis_motion_mj.py script.
Play the trained policy. In the virtual environment, run:
```
python scripts/train.py --mode play -c <CHECKPOINT PATH> 
```
For batch evaluation (e.g., using 4096 envs to evaluate AMASS 11313 motions), run:
```
python scripts/train.py --mode eval -c <CHECKPOINT PATH>
```
Sweep the hyperparameters using CARBS.
```
python scripts/train.py --config config.ini --mode sweep
```
To adjust the sweep parameters and range, edit the config.ini file. You might need to comment/uncomment some parts in the train.py script, sweep_carbs() function to make the sweep work.

Notes

python scripts/train.py --help shows the full list of options for the environment and training, which allows you to override the defaults in the config file.
I tested the style discriminator in the original PHC repo and here, saw that it did not improve the imitation performance, so turned it off in the config. To enable it, set use_amp_obs to True in the config, or add --use-amp-obs True to the command.
I saw several times that "fine-tuning" pretrained weights with -c <CHECKPOINT PATH> resulted in much faster learning. The L2 init reg loss is being logged in the wandb dashboard, and you can see greater L2 distance when learning from scratch compared to fine-tuning.
As mentioned in the PHC paper, I saw that the 6-layer MLP perform better than shallow MLPs, and SiLU activations perform better than ReLUs.
Adding LayerNorm before the last layer tamed the gradient norm, and I swept the hyperparameters with the max grad norm of 10 (the original PHC repo uses 50).
Observations are RMS normalized, but the rewards/values are not. The gamma was set to 0.98 to manage the value loss. The hyperparameter sweeps consistently converged to small lambda values, so I chose 0.2.
Also, the sweeps consistently converged to very aggressive clip coefs (0.01) and higher learning rates. I speculate that since the trainer uses the same learning rate for both actor and critic, it's using actor clipping to slow down the actor learning relative to the critic.
I tried using LSTM for both the actor and critic, respectively, and it didn't work better. These LSTM policies are included in the code, so feel free to try them.

References

This repository is built on top of the following amazing repositories:

Main PHC code, motion lib, poselib, and data scripts are from: PHC and Isaac Gym
The PPO and CARBS sweep code is from: PufferLib
Sample motion data is from: CMU Motion Capture Dataset, Subject 5, Motion 6

Please follow the license of the above repositories for usage.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
gymtorch		gymtorch
puffer_phc		puffer_phc
sample_data		sample_data
scripts		scripts
smpl		smpl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
env_setup.sh		env_setup.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Puffer PHC

Getting Started

Notes

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kywch/puffer-phc

Folders and files

Latest commit

History

Repository files navigation

Puffer PHC

Getting Started

Notes

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages