Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
SHELL=/bin/bash
LINT_PATHS=frasa_env/ *.py

mypy:
mypy ${LINT_PATHS}

lint:
# stop the build if there are Python syntax errors or undefined names
# see https://www.flake8rules.com/
ruff check ${LINT_PATHS} --select=E9,F63,F7,F82 --output-format=full
# exit-zero treats all errors as warnings.
ruff check ${LINT_PATHS} --exit-zero --output-format=concise

format:
# Sort imports
ruff check --select I ${LINT_PATHS} --fix
# Reformat using black
black ${LINT_PATHS}

check-codestyle:
# Sort imports
ruff check --select I ${LINT_PATHS}
# Reformat using black
black --check ${LINT_PATHS}

commit-checks: format mypy lint

.PHONY: lint format check-codestyle commit-checks
41 changes: 20 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,13 @@ Please note that the RL Baselines3 Zoo create a conflict on the gymnasium module

### Issue with MuJoCo using Wayland

If you are using Wayland (instead of X11), you may encounter issues with the MuJoCo viewer,
If you are using Wayland (instead of X11), you may encounter issues with the MuJoCo viewer,
such as frozen windows or buttons not working. To solve this, you can build
GLFW from source with shared libraries and set the `LD_LIBRARY_PATH` and
`PYGLFW_LIBRARY` environment variables to point to the built libraries.

1. Download source package here and unzip it: https://www.glfw.org/download.html

2. Install dependancies to build GLFW for Wayland and X11

```
Expand All @@ -58,7 +58,7 @@ make
```

5. Change LD_LIBRARY_PATH and PYGLFW_LIBRARY to match GLFW version you built and add it to your bashrc

```
export LD_LIBRARY_PATH="LD_LIBRARY_PATH:path/to/glfw/build/src/"
export PYGLFW_LIBRARY="path/to/glfw/build/src/libglfw.so"
Expand All @@ -68,18 +68,18 @@ export PYGLFW_LIBRARY="path/to/glfw/build/src/libglfw.so"

### Generating initial positions

Pre-generating initial positions for the standup environment is recommended,
as it can be time-consuming to generate them during training. To do so, you can
Pre-generating initial positions for the standup environment is recommended,
as it can be time-consuming to generate them during training. To do so, you can
use the standup_generate_initial.py script:

```bash
python standup_generate_initial.py
```

It will generate initial positions by letting the robot fall from random positions
It will generate initial positions by letting the robot fall from random positions
and store them in `frasa_env/env/standup_initial_configurations.pkl`.
Let the script run until you have collected enough initial positions
(typically a few thousand). You can stop the script at any time using Ctrl+C;
Let the script run until you have collected enough initial positions
(typically a few thousand). You can stop the script at any time using Ctrl+C;
the generated positions will be saved automatically.

<img src="https://github.com/user-attachments/assets/876ea026-487d-4c27-8c95-98c2bbacc8ee" width="40%" align="right">
Expand All @@ -104,7 +104,6 @@ You can train an agent using:
python train_sbx.py \
--algo crossq \
--env frasa-standup-v0 \
--gym-packages frasa_env \
--conf hyperparams/crossq.yml
```

Expand Down Expand Up @@ -150,14 +149,14 @@ Where the arguments are:

<br>

The Sigmaban robot is a small humanoid developed by the Rhoban team to
compete in the RoboCup KidSize league. The robot is 70 cm tall and weighs
7.5 kg. It has 20 degrees of freedom, a camera, and is equipped with pressure
sensors and an IMU.
The Sigmaban robot is a small humanoid developed by the Rhoban team to
compete in the RoboCup KidSize league. The robot is 70 cm tall and weighs
7.5 kg. It has 20 degrees of freedom, a camera, and is equipped with pressure
sensors and an IMU.

The MuJoCo XML model of this robot is located in the `frasa_env/mujoco_simulator/model` folder.
The MuJoCo XML model of this robot is located in the `frasa_env/mujoco_simulator/model` folder.

For a detailed description of the process to convert URDF to MuJoCo XML,
For a detailed description of the process to convert URDF to MuJoCo XML,
see the [README](frasa_env/mujoco_simulator/model/README.md) in the model directory.

<br>
Expand All @@ -172,15 +171,15 @@ The stand up environment can be found in `frasa_env/env/standup_env.py`.

The environment simplifies the learning process by controlling only the 5 DoFs presented on the right (elbow, shoulder_pitch, hip_pitch, knee, ankle_pitch). These are the primary joints involved in recovery and standing movements in the sagittal plane $(x, z)$. The actions are symmetrically applied to both sides of the robot.

The robot's state is characterized by the **pose vector** $\psi$, which includes the joint angles $q$ and the trunk pitch $\theta$. The target pose for recovery is defined as:
The robot's state is characterized by the **pose vector** $\psi$, which includes the joint angles $q$ and the trunk pitch $\theta$. The target pose for recovery is defined as:

$$ \psi_{\text{target}} = [q_{\text{target}}, \theta_{\text{target}}] $$

This target pose represents the upright position the robot should achieve after recovery.

At the start of each episode, the robot’s joint angles $q$ and trunk pitch $\theta$ are set to random values within their
physical limits. The robot is then released above the ground, simulating various fallen states. The physical simulation is
then stepped using current joints configuration as targets, until stability is reached, at which point
then stepped using current joints configuration as targets, until stability is reached, at which point
the learning process begins.

An episode termination occurs when the robot reaches an unsafe or irreversible state:
Expand All @@ -206,7 +205,7 @@ The observation space for the **StandupEnv** captures key elements of the robot'

### Action Space

The action space specifies control commands for the robot's 5 joints. Its structure depends on the control mode (`position`, `velocity`, or `error`) specified in `options["control"]`.
The action space specifies control commands for the robot's 5 joints. Its structure depends on the control mode (`position`, `velocity`, or `error`) specified in `options["control"]`.

| **Mode** | **Description** | **Range** | **Notes** |
|---|---|---|---|
Expand All @@ -226,7 +225,7 @@ $$ R = R_{state} + R_{variation} + R_{collision} $$

The state proximity reward $R_{state}$ represents the proximity to the desired state $\psi_{target}$:

$$ R_{state} = \text{exp}\left(-20 \cdot \left| \psi_{current} - \psi_\{target} \right|^2 \right) $$
$$ R_{state} = \text{exp}\left(-20 \cdot \left| \psi_{current} - \psi_\{target} \right|^2 \right) $$

2. Variation Penalty Reward

Expand Down Expand Up @@ -294,12 +293,12 @@ To cite this repository in publications:

```bibtex
@article{frasa2024,
title={FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots},
title={FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots},
author={Clément Gaspard and Marc Duclusaud and Grégoire Passault and Mélodie Daniel and Olivier Ly},
year={2024},
eprint={2410.08655},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.08655},
url={https://arxiv.org/abs/2410.08655},
}
```
7 changes: 6 additions & 1 deletion enjoy_sbx.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
import gymnasium as gym
import rl_zoo3
import rl_zoo3.enjoy
from rl_zoo3.enjoy import enjoy
from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, CrossQ

import frasa_env

gym.register_envs(frasa_env)

rl_zoo3.ALGOS["ddpg"] = DDPG
rl_zoo3.ALGOS["dqn"] = DQN
# See SBX readme to use DroQ configuration
Expand All @@ -17,4 +22,4 @@


if __name__ == "__main__":
enjoy()
enjoy()
2 changes: 1 addition & 1 deletion frasa_env/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

register(
id="frasa-standup-v0",
entry_point="frasa_env.env:FRASAEnv"
entry_point="frasa_env.env:FRASAEnv",
)
3 changes: 2 additions & 1 deletion frasa_env/env/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from frasa_env.env.standup import FRASAEnv
from frasa_env.env.standup_env import StandupEnv

from frasa_env.env.standup import FRASAEnv
__all__ = ["FRASAEnv", "StandupEnv"]
9 changes: 6 additions & 3 deletions frasa_env/env/standup.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
from .standup_env import StandupEnv
import numpy as np

from .standup_env import StandupEnv


class FRASAEnv(StandupEnv):
def __init__(self, evaluation="none", render_mode="none", options={}):
def __init__(self, evaluation=False, render_mode="none", options=None):
options = options or {}
options["stabilization_time"] = 2.0
options["truncate_duration"] = 5.0
options["dt"] = 0.05
options["vmax"] = 2 * np.pi
options["reset_final_p"] = 0.1

super().__init__(evaluation=evaluation, render_mode=render_mode, options=options)
super().__init__(evaluation=evaluation, render_mode=render_mode, options=options)
49 changes: 30 additions & 19 deletions frasa_env/env/standup_env.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,20 @@
import numpy as np
import os
import pickle
import random
import os
import time
import warnings
from typing import Optional

import gymnasium
import numpy as np
from gymnasium import spaces

from frasa_env.mujoco_simulator.simulator import Simulator, tf


class StandupEnv(gymnasium.Env):
metadata = {"render_modes": ["human", "none"]}
metadata = {"render_modes": ["human", "none"], "render_fps": 30}

def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = False):
def __init__(self, render_mode="none", options: Optional[dict] = None, evaluation: bool = False):
self.options = {
# Duration of the stabilization pre-simulation (waiting for the gravity to stabilize the robot) [s]
"stabilization_time": 2.0,
Expand Down Expand Up @@ -52,7 +55,7 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
# Previous actions
"previous_actions": 1,
}
self.options.update(options)
self.options.update(options or {})

self.render_mode = render_mode
self.sim = Simulator()
Expand Down Expand Up @@ -80,14 +83,14 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
)
elif self.options["control"] == "velocity":
self.action_space = spaces.Box(
np.array([-self.options["vmax"]] * len(self.dofs)),
np.array([self.options["vmax"]] * len(self.dofs)),
np.array([-self.options["vmax"]] * len(self.dofs), dtype=np.float32),
np.array([self.options["vmax"]] * len(self.dofs), dtype=np.float32),
dtype=np.float32,
)
elif self.options["control"] == "error":
self.action_space = spaces.Box(
np.array([-np.pi / 4] * len(self.dofs)),
np.array([np.pi / 4] * len(self.dofs)),
np.array([-np.pi / 4] * len(self.dofs), dtype=np.float32),
np.array([np.pi / 4] * len(self.dofs), dtype=np.float32),
dtype=np.float32,
)
else:
Expand All @@ -109,7 +112,8 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
-10,
# Previous action
*(list(self.action_space.low) * self.options["previous_actions"]),
]
],
dtype=np.float32,
),
np.array(
[
Expand All @@ -125,7 +129,8 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
10,
# Previous action
*(list(self.action_space.high) * self.options["previous_actions"]),
]
],
dtype=np.float32,
),
dtype=np.float32,
)
Expand Down Expand Up @@ -156,7 +161,7 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
initial_config_path = self.get_initial_config_filename()
self.initial_config = None
if os.path.exists(initial_config_path):
print(f"Loading initial configurations from {initial_config_path}")
# print(f"Loading initial configurations from {initial_config_path}")
with open(initial_config_path, "rb") as f:
self.initial_config = pickle.load(f)

Expand Down Expand Up @@ -285,7 +290,7 @@ def step(self, action):

state_current = [*self.q_history[-1], self.tilt_history[-1]]

reward = np.exp(-20*(np.linalg.norm(np.array(state_current) - np.array(self.options["desired_state"]))**2))
reward = np.exp(-20 * (np.linalg.norm(np.array(state_current) - np.array(self.options["desired_state"])) ** 2))

action_variation = np.abs(action - self.previous_actions[-1])
self.previous_actions.append(action)
Expand Down Expand Up @@ -368,7 +373,7 @@ def apply_randomization(self):
def randomize_fall(self, target: bool = False):
# Decide if we will use the target
my_target = np.copy(self.options["desired_state"])
if target == False:
if target is False:
target = self.np_random.random() < self.options["reset_final_p"]

# Selecting a random configuration
Expand Down Expand Up @@ -396,18 +401,24 @@ def randomize_fall(self, target: bool = False):
self.sim.set_T_world_site("trunk", T_world_trunk)

# Wait for the robot to stabilize
for k in range(round(self.options["stabilization_time"] / self.sim.dt)):
for _ in range(round(self.options["stabilization_time"] / self.sim.dt)):
self.sim.step()

def reset(
self,
seed: int = None,
target: bool = False,
use_cache: bool = True,
options: dict = {},
options: Optional[dict] = None,
):
super().reset(seed=seed)
self.sim.reset()
options = options or {}
target = options.get("target", False)
use_cache = options.get("use_cache", True)

if use_cache and self.initial_config is None:
warnings.warn(
"use_cache=True but no initial config file could be loaded. Did you run standup_generate_initial.py?"
)

# Initial robot configuration
if use_cache and self.initial_config is not None:
Expand Down
19 changes: 8 additions & 11 deletions frasa_env/mujoco_simulator/simulator.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
import os
import numpy as np
import time
from typing import Optional

import meshcat.transformations as tf
import mujoco
import mujoco.viewer
import meshcat.transformations as tf
import numpy as np


class Simulator:
def __init__(self, model_dir: str | None = None):
def __init__(self, model_dir: Optional[str] = None):
# If model_dir is not provided, use the current directory
if model_dir is None:
model_dir = os.path.join(os.path.dirname(__file__) + "/model/")
self.model_dir = model_dir

# Load the model and data
self.model: mujoco.MjModel = mujoco.MjModel.from_xml_path(
f"{model_dir}/scene.xml"
)
self.model: mujoco.MjModel = mujoco.MjModel.from_xml_path(f"{model_dir}/scene.xml")
self.data: mujoco.MjData = mujoco.MjData(self.model)

# Retrieve the degrees of freedom id/name pairs
Expand Down Expand Up @@ -47,12 +47,9 @@ def self_collisions(self) -> float:

def centroidal_force(self) -> float:
return np.linalg.norm(self.data.qfrc_constraint[3:])

def dof_names(self) -> list:
return [
mujoco.mj_id2name(self.model, mujoco.mjtObj.mjOBJ_ACTUATOR, i)
for i in range(self.model.nu)
]
return [mujoco.mj_id2name(self.model, mujoco.mjtObj.mjOBJ_ACTUATOR, i) for i in range(self.model.nu)]

def reset(self) -> None:
mujoco.mj_resetData(self.model, self.data)
Expand Down
Loading