Skip to content

ryanrahman27/Residual-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Residual RL for Humanoid Push Recovery

A simulation pipeline for training a residual reinforcement learning policy on a MuJoCo humanoid that can recover from pushes. The system uses a classical stabilization controller as the base policy and SAC (Soft Actor-Critic) for learning residual actions.

Features

  • Pre-built MuJoCo Models: Uses gymnasium's built-in Humanoid-v5 model (no custom XML files needed)
  • Base Control: Classical PD (Proportional-Derivative) stabilization controller
  • Residual RL: SAC algorithm learns residual actions on top of base control
  • Push Disturbances: Configurable random push forces applied to the humanoid torso
  • Training Pipeline: Complete training script with evaluation and checkpointing

Installation

  1. Install dependencies:
pip install -r requirements.txt

Usage

Training

Train a residual RL agent:

python train.py --total_timesteps 1000000 --base_controller pd --push_probability 0.1

Key arguments:

  • --total_timesteps: Total training timesteps (default: 1,000,000)
  • --base_controller: Base controller type: "pd" or "lqr" (default: "pd")
  • --push_probability: Probability of push per step (default: 0.1)
  • --push_force_min: Minimum push force (default: 50.0)
  • --push_force_max: Maximum push force (default: 200.0)
  • --log_dir: Directory for logs and tensorboard (default: "./logs")
  • --save_dir: Directory for saved models (default: "./models")
  • --eval_freq: Evaluation frequency in timesteps (default: 10000)
  • --render_eval: Render during evaluation episodes (slows training)
  • --vis_freq: Visualize every N timesteps during training (default: 50000, set to 0 to disable)
  • --vis_episodes: Number of episodes to visualize each time (default: 1)

Visualization

Visualize with base controller only (no trained agent):

python visualize.py --base_controller pd --push_probability 0.1

Visualize with trained agent:

python visualize.py --model_path ./models/best_model.zip --speed 1.0

Key arguments:

  • --model_path: Path to trained model (optional, if not provided shows base controller only)
  • --base_controller: Base controller type: "pd" or "lqr" (default: "pd")
  • --push_probability: Probability of push per step (default: 0.1)
  • --push_force_min: Minimum push force (default: 50.0)
  • --push_force_max: Maximum push force (default: 200.0)
  • --speed: Simulation speed multiplier (default: 1.0, use 2.0 for 2x speed)
  • --deterministic: Use deterministic policy for trained models (default: True)

Quick test with visualization:

python test_env.py --render

Evaluation

Evaluate a trained model:

python evaluate.py --model_path ./models/best_model.zip --n_episodes 10 --render

Key arguments:

  • --model_path: Path to trained model
  • --vec_normalize_path: Path to normalization stats (optional)
  • --n_episodes: Number of evaluation episodes (default: 10)
  • --render: Render episodes during evaluation
  • --deterministic: Use deterministic policy (default: True)

Architecture

Environment (residual_rl/env.py)

HumanoidPushEnv wraps gymnasium's Humanoid-v5 environment and adds:

  • Push disturbance mechanism (random forces applied to torso)
  • Integration with base controller
  • Residual action space for RL agent

Base Controller (residual_rl/base_controller.py)

PDStabilizationController provides classical control:

  • Maintains upright posture
  • Stabilizes base position
  • Provides baseline policy for residual RL to improve upon

Training (train.py)

  • Creates base controller and environment
  • Trains SAC agent with residual actions
  • Evaluates periodically and saves checkpoints
  • Uses VecNormalize for observation/reward normalization

Evaluation (evaluate.py)

  • Loads trained model
  • Evaluates performance across multiple episodes
  • Computes statistics (mean reward, success rate, etc.)
  • Optional rendering for visualization

How Residual RL Works

  1. Base Control: The PD controller computes base actions to maintain stability
  2. Residual Action: The SAC agent learns to output residual actions (additions to base control)
  3. Combined Action: Total action = base_action + residual_action
  4. Learning: SAC optimizes residual actions to improve recovery from pushes

Output Structure

.
├── logs/              # Tensorboard logs and evaluation results
├── models/            # Saved models and checkpoints
│   ├── best_model.zip
│   ├── final_model.zip
│   ├── checkpoints/
│   └── vec_normalize.pkl
└── ...

Monitoring Training

View training progress with TensorBoard:

tensorboard --logdir ./logs

Customization

Adjusting Push Disturbances

Modify push parameters in training:

python train.py --push_probability 0.2 --push_force_min 100 --push_force_max 300

Changing Base Controller

Use LQR controller instead of PD:

python train.py --base_controller lqr

Custom Reward Function

Modify _compute_reward() in residual_rl/env.py to adjust the reward function.

Requirements

  • Python 3.8+
  • MuJoCo 3.0+
  • PyTorch
  • stable-baselines3
  • gymnasium

Notes

  • The environment uses gymnasium's built-in Humanoid-v5 model
  • Base controller uses PD control with configurable gains
  • SAC hyperparameters can be adjusted in train.py
  • Push disturbances are applied horizontally to the torso

License

MIT License

About

Residual RL Policy for Humanoid Stability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages