Rhoban · MarcDcls · Nov 23, 2024 · Nov 22, 2024 · Nov 22, 2024 · Nov 22, 2024
diff --git a/Makefile b/Makefile
@@ -0,0 +1,28 @@
+SHELL=/bin/bash
+LINT_PATHS=frasa_env/ *.py
+
+mypy:
+	mypy ${LINT_PATHS}
+
+lint:
+	# stop the build if there are Python syntax errors or undefined names
+	# see https://www.flake8rules.com/
+	ruff check ${LINT_PATHS} --select=E9,F63,F7,F82 --output-format=full
+	# exit-zero treats all errors as warnings.
+	ruff check ${LINT_PATHS} --exit-zero --output-format=concise
+
+format:
+	# Sort imports
+	ruff check --select I ${LINT_PATHS} --fix
+	# Reformat using black
+	black ${LINT_PATHS}
+
+check-codestyle:
+	# Sort imports
+	ruff check --select I ${LINT_PATHS}
+	# Reformat using black
+	black --check ${LINT_PATHS}
+
+commit-checks: format mypy lint
+
+.PHONY: lint format check-codestyle commit-checks
diff --git a/README.md b/README.md
@@ -30,13 +30,13 @@ Please note that the RL Baselines3 Zoo create a conflict on the gymnasium module
 
 ### Issue with MuJoCo using Wayland
 
-If you are using Wayland (instead of X11), you may encounter issues with the MuJoCo viewer, 
+If you are using Wayland (instead of X11), you may encounter issues with the MuJoCo viewer,
 such as frozen windows or buttons not working. To solve this, you can build
 GLFW from source with shared libraries and set the `LD_LIBRARY_PATH` and
 `PYGLFW_LIBRARY` environment variables to point to the built libraries.
 
 1. Download source package here and unzip it: https://www.glfw.org/download.html
-   
+
 2. Install dependancies to build GLFW for Wayland and X11
 
 ```
@@ -58,7 +58,7 @@ make
 ```
 
 5. Change LD_LIBRARY_PATH and PYGLFW_LIBRARY to match GLFW version you built and add it to your bashrc
-   
+
 ```
 export LD_LIBRARY_PATH="LD_LIBRARY_PATH:path/to/glfw/build/src/"
 export PYGLFW_LIBRARY="path/to/glfw/build/src/libglfw.so"
@@ -68,18 +68,18 @@ export PYGLFW_LIBRARY="path/to/glfw/build/src/libglfw.so"
 
 ### Generating initial positions
 
-Pre-generating initial positions for the standup environment is recommended, 
-as it can be time-consuming to generate them during training. To do so, you can 
+Pre-generating initial positions for the standup environment is recommended,
+as it can be time-consuming to generate them during training. To do so, you can
 use the standup_generate_initial.py script:
 
 ```bash
 python standup_generate_initial.py
 ```
 
-It will generate initial positions by letting the robot fall from random positions 
+It will generate initial positions by letting the robot fall from random positions
 and store them in `frasa_env/env/standup_initial_configurations.pkl`.
-Let the script run until you have collected enough initial positions 
-(typically a few thousand). You can stop the script at any time using Ctrl+C; 
+Let the script run until you have collected enough initial positions
+(typically a few thousand). You can stop the script at any time using Ctrl+C;
 the generated positions will be saved automatically.
 
 <img src="https://github.com/user-attachments/assets/876ea026-487d-4c27-8c95-98c2bbacc8ee" width="40%" align="right">
@@ -104,7 +104,6 @@ You can train an agent using:
 python train_sbx.py \
     --algo crossq \
     --env frasa-standup-v0 \
-    --gym-packages frasa_env \
     --conf hyperparams/crossq.yml
 ```
 
@@ -150,14 +149,14 @@ Where the arguments are:
 
 <br>
 
-The Sigmaban robot is a small humanoid developed by the Rhoban team to 
-compete in the RoboCup KidSize league. The robot is 70 cm tall and weighs 
-7.5 kg. It has 20 degrees of freedom, a camera, and is equipped with pressure 
-sensors and an IMU. 
+The Sigmaban robot is a small humanoid developed by the Rhoban team to
+compete in the RoboCup KidSize league. The robot is 70 cm tall and weighs
+7.5 kg. It has 20 degrees of freedom, a camera, and is equipped with pressure
+sensors and an IMU.
 
-The MuJoCo XML model of this robot is located in the `frasa_env/mujoco_simulator/model` folder. 
+The MuJoCo XML model of this robot is located in the `frasa_env/mujoco_simulator/model` folder.
 
-For a detailed description of the process to convert URDF to MuJoCo XML, 
+For a detailed description of the process to convert URDF to MuJoCo XML,
 see the [README](frasa_env/mujoco_simulator/model/README.md) in the model directory.
 
 <br>
@@ -172,15 +171,15 @@ The stand up environment can be found in `frasa_env/env/standup_env.py`.
 
 The environment simplifies the learning process by controlling only the 5 DoFs presented on the right (elbow, shoulder_pitch, hip_pitch, knee, ankle_pitch). These are the primary joints involved in recovery and standing movements in the sagittal plane $(x, z)$. The actions are symmetrically applied to both sides of the robot.
 
-The robot's state is characterized by the **pose vector** $\psi$, which includes the joint angles $q$ and the trunk pitch $\theta$. The target pose for recovery is defined as:  
+The robot's state is characterized by the **pose vector** $\psi$, which includes the joint angles $q$ and the trunk pitch $\theta$. The target pose for recovery is defined as:
 
 $$ \psi_{\text{target}} = [q_{\text{target}}, \theta_{\text{target}}] $$
 
 This target pose represents the upright position the robot should achieve after recovery.
 
 At the start of each episode, the robot’s joint angles $q$ and trunk pitch $\theta$ are set to random values within their
 physical limits. The robot is then released above the ground, simulating various fallen states. The physical simulation is
-then stepped using current joints configuration as targets, until stability is reached, at which point 
+then stepped using current joints configuration as targets, until stability is reached, at which point
 the learning process begins.
 
 An episode termination occurs when the robot reaches an unsafe or irreversible state:
@@ -206,7 +205,7 @@ The observation space for the **StandupEnv** captures key elements of the robot'
 
 ### Action Space
 
-The action space specifies control commands for the robot's 5 joints. Its structure depends on the control mode (`position`, `velocity`, or `error`) specified in `options["control"]`. 
+The action space specifies control commands for the robot's 5 joints. Its structure depends on the control mode (`position`, `velocity`, or `error`) specified in `options["control"]`.
 
 | **Mode** | **Description** | **Range** | **Notes** |
 |---|---|---|---|
@@ -226,7 +225,7 @@ $$ R = R_{state} + R_{variation} + R_{collision} $$
 
 The state proximity reward $R_{state}$ represents the proximity to the desired state $\psi_{target}$:
 
-$$ R_{state} = \text{exp}\left(-20 \cdot \left| \psi_{current} - \psi_\{target} \right|^2 \right) $$ 
+$$ R_{state} = \text{exp}\left(-20 \cdot \left| \psi_{current} - \psi_\{target} \right|^2 \right) $$
 
 2. Variation Penalty Reward
 
@@ -294,12 +293,12 @@ To cite this repository in publications:
 
 ```bibtex
 @article{frasa2024,
-   title={FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots}, 
+   title={FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots},
    author={Clément Gaspard and Marc Duclusaud and Grégoire Passault and Mélodie Daniel and Olivier Ly},
    year={2024},
    eprint={2410.08655},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
-   url={https://arxiv.org/abs/2410.08655}, 
+   url={https://arxiv.org/abs/2410.08655},
 }
 ```
diff --git a/enjoy_sbx.py b/enjoy_sbx.py
@@ -1,8 +1,13 @@
+import gymnasium as gym
 import rl_zoo3
 import rl_zoo3.enjoy
 from rl_zoo3.enjoy import enjoy
 from sbx import DDPG, DQN, PPO, SAC, TD3, TQC, CrossQ
 
+import frasa_env
+
+gym.register_envs(frasa_env)
+
 rl_zoo3.ALGOS["ddpg"] = DDPG
 rl_zoo3.ALGOS["dqn"] = DQN
 # See SBX readme to use DroQ configuration
@@ -17,4 +22,4 @@
 
 
 if __name__ == "__main__":
-    enjoy()
+    enjoy()
diff --git a/frasa_env/__init__.py b/frasa_env/__init__.py
@@ -2,5 +2,5 @@
 
 register(
     id="frasa-standup-v0",
-    entry_point="frasa_env.env:FRASAEnv"
+    entry_point="frasa_env.env:FRASAEnv",
 )
diff --git a/frasa_env/env/__init__.py b/frasa_env/env/__init__.py
@@ -1,3 +1,4 @@
+from frasa_env.env.standup import FRASAEnv
 from frasa_env.env.standup_env import StandupEnv
 
-from frasa_env.env.standup import FRASAEnv
+__all__ = ["FRASAEnv", "StandupEnv"]
diff --git a/frasa_env/env/standup.py b/frasa_env/env/standup.py
@@ -1,12 +1,15 @@
-from .standup_env import StandupEnv
 import numpy as np
 
+from .standup_env import StandupEnv
+
+
 class FRASAEnv(StandupEnv):
-    def __init__(self, evaluation="none", render_mode="none", options={}):
+    def __init__(self, evaluation=False, render_mode="none", options=None):
+        options = options or {}
         options["stabilization_time"] = 2.0
         options["truncate_duration"] = 5.0
         options["dt"] = 0.05
         options["vmax"] = 2 * np.pi
         options["reset_final_p"] = 0.1
 
-        super().__init__(evaluation=evaluation, render_mode=render_mode, options=options)
+        super().__init__(evaluation=evaluation, render_mode=render_mode, options=options)
diff --git a/frasa_env/env/standup_env.py b/frasa_env/env/standup_env.py
@@ -1,17 +1,20 @@
-import numpy as np
+import os
 import pickle
 import random
-import os
-import time
+import warnings
+from typing import Optional
+
 import gymnasium
+import numpy as np
 from gymnasium import spaces
+
 from frasa_env.mujoco_simulator.simulator import Simulator, tf
 
 
 class StandupEnv(gymnasium.Env):
-    metadata = {"render_modes": ["human", "none"]}
+    metadata = {"render_modes": ["human", "none"], "render_fps": 30}
 
-    def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = False):
+    def __init__(self, render_mode="none", options: Optional[dict] = None, evaluation: bool = False):
         self.options = {
             # Duration of the stabilization pre-simulation (waiting for the gravity to stabilize the robot) [s]
             "stabilization_time": 2.0,
@@ -52,7 +55,7 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
             # Previous actions
             "previous_actions": 1,
         }
-        self.options.update(options)
+        self.options.update(options or {})
 
         self.render_mode = render_mode
         self.sim = Simulator()
@@ -80,14 +83,14 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
             )
         elif self.options["control"] == "velocity":
             self.action_space = spaces.Box(
-                np.array([-self.options["vmax"]] * len(self.dofs)),
-                np.array([self.options["vmax"]] * len(self.dofs)),
+                np.array([-self.options["vmax"]] * len(self.dofs), dtype=np.float32),
+                np.array([self.options["vmax"]] * len(self.dofs), dtype=np.float32),
                 dtype=np.float32,
             )
         elif self.options["control"] == "error":
             self.action_space = spaces.Box(
-                np.array([-np.pi / 4] * len(self.dofs)),
-                np.array([np.pi / 4] * len(self.dofs)),
+                np.array([-np.pi / 4] * len(self.dofs), dtype=np.float32),
+                np.array([np.pi / 4] * len(self.dofs), dtype=np.float32),
                 dtype=np.float32,
             )
         else:
@@ -109,7 +112,8 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
                     -10,
                     # Previous action
                     *(list(self.action_space.low) * self.options["previous_actions"]),
-                ]
+                ],
+                dtype=np.float32,
             ),
             np.array(
                 [
@@ -125,7 +129,8 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
                     10,
                     # Previous action
                     *(list(self.action_space.high) * self.options["previous_actions"]),
-                ]
+                ],
+                dtype=np.float32,
             ),
             dtype=np.float32,
         )
@@ -156,7 +161,7 @@ def __init__(self, render_mode="none", options: dict = {}, evaluation: bool = Fa
         initial_config_path = self.get_initial_config_filename()
         self.initial_config = None
         if os.path.exists(initial_config_path):
-            print(f"Loading initial configurations from {initial_config_path}")
+            # print(f"Loading initial configurations from {initial_config_path}")
             with open(initial_config_path, "rb") as f:
                 self.initial_config = pickle.load(f)
 
@@ -285,7 +290,7 @@ def step(self, action):
 
         state_current = [*self.q_history[-1], self.tilt_history[-1]]
 
-        reward = np.exp(-20*(np.linalg.norm(np.array(state_current) - np.array(self.options["desired_state"]))**2))
+        reward = np.exp(-20 * (np.linalg.norm(np.array(state_current) - np.array(self.options["desired_state"])) ** 2))
 
         action_variation = np.abs(action - self.previous_actions[-1])
         self.previous_actions.append(action)
@@ -368,7 +373,7 @@ def apply_randomization(self):
     def randomize_fall(self, target: bool = False):
         # Decide if we will use the target
         my_target = np.copy(self.options["desired_state"])
-        if target == False:
+        if target is False:
             target = self.np_random.random() < self.options["reset_final_p"]
 
         # Selecting a random configuration
@@ -396,18 +401,24 @@ def randomize_fall(self, target: bool = False):
         self.sim.set_T_world_site("trunk", T_world_trunk)
 
         # Wait for the robot to stabilize
-        for k in range(round(self.options["stabilization_time"] / self.sim.dt)):
+        for _ in range(round(self.options["stabilization_time"] / self.sim.dt)):
             self.sim.step()
 
     def reset(
         self,
         seed: int = None,
-        target: bool = False,
-        use_cache: bool = True,
-        options: dict = {},
+        options: Optional[dict] = None,
     ):
         super().reset(seed=seed)
         self.sim.reset()
+        options = options or {}
+        target = options.get("target", False)
+        use_cache = options.get("use_cache", True)
+
+        if use_cache and self.initial_config is None:
+            warnings.warn(
+                "use_cache=True but no initial config file could be loaded. Did you run standup_generate_initial.py?"
+            )
 
         # Initial robot configuration
         if use_cache and self.initial_config is not None:

diff --git a/frasa_env/mujoco_simulator/simulator.py b/frasa_env/mujoco_simulator/simulator.py
@@ -1,22 +1,22 @@
 import os
-import numpy as np
 import time
+from typing import Optional
+
+import meshcat.transformations as tf
 import mujoco
 import mujoco.viewer
-import meshcat.transformations as tf
+import numpy as np
 
 
 class Simulator:
-    def __init__(self, model_dir: str | None = None):
+    def __init__(self, model_dir: Optional[str] = None):
         # If model_dir is not provided, use the current directory
         if model_dir is None:
             model_dir = os.path.join(os.path.dirname(__file__) + "/model/")
         self.model_dir = model_dir
 
         # Load the model and data
-        self.model: mujoco.MjModel = mujoco.MjModel.from_xml_path(
-            f"{model_dir}/scene.xml"
-        )
+        self.model: mujoco.MjModel = mujoco.MjModel.from_xml_path(f"{model_dir}/scene.xml")
         self.data: mujoco.MjData = mujoco.MjData(self.model)
 
         # Retrieve the degrees of freedom id/name pairs
@@ -47,12 +47,9 @@ def self_collisions(self) -> float:
 
     def centroidal_force(self) -> float:
         return np.linalg.norm(self.data.qfrc_constraint[3:])
-    
+
     def dof_names(self) -> list:
-        return [
-            mujoco.mj_id2name(self.model, mujoco.mjtObj.mjOBJ_ACTUATOR, i)
-            for i in range(self.model.nu)
-        ]
+        return [mujoco.mj_id2name(self.model, mujoco.mjtObj.mjOBJ_ACTUATOR, i) for i in range(self.model.nu)]
 
     def reset(self) -> None:
         mujoco.mj_resetData(self.model, self.data)