This repository documents the Eigenbot team's reinforcement learning (RL) work using legged_gym, a derivative of NVIDIA's Isaac Gym.
It includes environments, robot models, and configurations tailored for training and testing legged robots.
-
CMU 10-703: Deep RL - Course Website
-
Sim2Real Lecture (Highly Recommended for Application) - Lecture 24
This repo, referred to internally as bio_eigen, consists of four main folders:
-
${\color{green}eigenbot/}$ -
${\color{green}isaacgym/}$ -
${\color{green}legged\_gym/}$ Core training and inference framework.
📍Most work happens here:
${\color{green}legged\_gym/legged\_gym/env/base/legged\_robot.py/}$ -
${\color{green}rsl\_rl/}$ RL algorithm implemnetations, including PPO and other on-policy/off-policy methods
-
Defines all core environment functions:
-
${\color{green}step()}$ -> advances simulation by one step -
${\color{green}reset()}$ -> resets environment/robot state
-
-
Contains all reward functions at the bottom of the file, following the naming pattern:
def _reward_<reward_name>(self)
Rewards must be defined in the above format, after definition, the corresponding reward scales must be added into
${\color{green}legged\_robot\_config/}$ - Example: *
${\color{green}\_reward\_tracking\_lin\_vel()}$
- Example: *
- Centralized configuration file for:
- Environment Setup
- Terrain generation
- Reward weights
- Network architecture
- Command sampling
- Initial states
- Control & assets
- Domain randomization
This module defines how terrains are generated, selected, and managed for the legged robot environments
-
${\color{green}Terrain}$ class- Initializes the terrain grid for multiple robots
${\color{green}(num\_rows \ x \ num\_cols)}$ - Supports different terrain generation modes:
-
${\color{green}randomized\_terrain()}$ -> randomly generates terrain pieces -
${\color{green}curriculum()}$ -> terrains incresae in difficulty row by row -
${\color{green}seIected\_terrain()}$ -> uses a manually chosen terrain type
-
- Stores height maps (
${\color{green}height\_field\_raw}$ ) and origins for each sub-terrain.
- Initializes the terrain grid for multiple robots
-
${\color{green}make\_terrain(choice,\ difficulty)}$ :- Builds different terrain types (slopes, stairs, discrete obstacles, stepping stones, gaps, pits, ...) based on proportions + difficulty
-
${\color{green}add\_terrain\_to\_map()}$ :- Places generated sub-terrains into the global map
- Sets each environment's origin (x, y, z)
-
Helper Functions:
-
${\color{green}gap\_terrain()}$ -> creates a gap in the map -
${\color{green}pit\_terrain()}$ -> creates a pit with depth
-
| Argument | Type | Default | Description |
|---|---|---|---|
| --task | str | "anymal_c_flat" | Name of the task/environment. Overrides config file if provided. |
| --resume | flag | False | Resume training from a checkpoint. |
| --experiment_name | str | None | Name of the experiment to run or load. Overrides config file. |
| --run_name | str | "new" | Name of the run (to distinguish runs within the same experiment). Overrides config file. |
| --expt_id | str | "00-001" | Experiment ID tag (useful for structured naming). Overrides config file. |
| --load_run | str | -1 | Run directory to load when --resume=True. If -1, loads the last run. |
| --checkpoint | int | -1 | Model checkpoint to load. If -1, loads the latest checkpoint. |
| --headless | flag | False | Run simulation without GUI (offscreen/headless mode). |
| --horovod | flag | False | Enable Horovod for distributed (multi-GPU) training. |
| --rl_device | str | "cuda:0" | Device used by the RL algorithm (cpu, cuda:0, etc.). |
| --num_envs | int | Config default | Number of environments to create. Overrides config file. |
| --seed | int | Config default | Random seed for reproducibility. Overrides config file. |
| --max_iterations | int | Config default | Maximum number of training iterations. Overrides config file. |
| --show_heading | flag | False | Visualize robot’s heading direction in the viewer. |
| --rough_terrain | flag | False | Enable rough terrain (instead of flat ground). |
| --debug | flag | False | Disable Weights & Biases (wandb) logging (debug mode). |
| --no_wandb | flag | False | Run without wandb logging entirely. |
Example usage:
- Train eigenbot on flat terrain with 4096 envs:
python train.py --task eigenbot_flat --num_envs 4096 --experiment_name locomotion_flat-
Introduces encoder modules to process observation states.
-
Includes variants such as:
-
History Encoder → encodes past states for temporal context.
-
Privileged Encoder → leverages extra simulation-only information during training.
-
-
How to follow the flow:
-
Observation states defined in legged_gym/envs/base/legged_robot.py.
-
Passed into encoder modules.
-
Integrated into training via
${\color{green}rsl\_rl}$ (${\color{green}on\_policy\_runner.py}$ ,${\color{green}ppo.py}$ ,${\color{green}vec\_env.py}$ , and${\color{green}actor\_critic.py}$ ).
-
-
Extends the encoder framework by adding a Depth Encoder Module.
-
Depth Encoder:
-
Processes simulated depth maps (from sensors or render).
-
Outputs latent features concatenated with standard encoders.
-
-
Integrated seamlessly into the PPO pipeline in
${\color{green}rsl\_rl}$ .
👉 These two branches build upon the legged_robot observation space and connect into rsl_rl training pipelines, but add new ways of representing or enriching observations.