Skip to content

Conversation

@sheim
Copy link
Owner

@sheim sheim commented Mar 19, 2024

Run

python script/train.py --task=humanoid

See file mit_humanoid.py.

Checklist before requesting a review

  • This is expected to break regression tests.
  • I have assigned a reviewer
  • I have added the PR to the project, and tagged with with priority
  • If it is a core feature, I have added tests.
  • I have set up pre-commit hooks with ruff, or run ruff format . manually

sheim added 22 commits February 16, 2024 13:31
Same up to 1e-5 on individual values. Because of differencing, this can end up in much larger differences (1e0) occasionally, after normalization.
Left without boot-strapping on last step, seems like it makes no difference.
Boot-strapping on time-outs made a big difference though.
…d history length.

Implemented for dof_pos_target, dof_pos, dof_vel.
in preparation for domain randomization of gains
…f_pos_history` (used for smoothness reward).

Some tuning.
Change stance/swing reward to use the smoothed square wave.
@sheim sheim self-assigned this Mar 19, 2024
class runner:
policy_class_name = "ActorCritic"
algorithm_class_name = "PPO"
algorithm_class_name = "PPO2"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PPO2 splits up the critic update and the actor update, and is otherwise pretty much the same. This is also using the new tensorDict storage. You don't really need to look under the hood unless you want.

self._resample_commands(env_ids)
# * reset buffers
self.dof_pos_history[env_ids] = 0.0
self.dof_pos_history[env_ids] = (
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to reset to current state rather than 0, so the first 3 time-steps don't result in bogus time-differences.

)
self.p_gains = torch.zeros(
self.num_actuators, dtype=torch.float, device=self.device
self.num_envs, self.num_actuators, dtype=torch.float, device=self.device
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in prep to for randomizing gains

self.num_actuators,
dtype=torch.float,
device=self.device,
self.num_envs, self.num_actuators, dtype=torch.float, device=self.device
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore

self.num_envs, 1, dtype=torch.float, device=self.device
)

# # * get the body_name to body_index dict
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtf is this? Pretty sure this was already there, not sure why it is popping up in the diff


def _init_buffers(self):
super()._init_buffers()
self.oscillators = torch.zeros(self.num_envs, 2, device=self.device)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an oscillator for each leg. They are initialized at pi phase-shift, so it's the same, then we don't have to manually shift.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So theres a sin and cos for each leg?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the observations, yes

return rew_vel + rew_pos - rew_base_vel * self._switch("stand")

def _reward_stance(self):
# phase = torch.maximum(
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will zap later

tracking_sigma = 0.5

# a smooth switch based on |cmd| (commanded velocity).
switch_scale = 0.5
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the switch function. Used this in Jenny's project to smoothly turn on/off rewards that we only want on or off when standing still or moving. switch_scale controls how quickly in moves from 0 to 1 (plot the function to see), and switch_threshold is when it starts to transition, based on |command|.

critic_hidden_dims = [256, 256, 128]
# * can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"
activation = "tanh"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a sweep the other day, tanh had high rewards 8 times, compared to 4 for elu... very hand wavy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spicy

Copy link
Contributor

@elijahstangerjones elijahstangerjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for porting this steve

base_lin_vel = 1.5
commands = 1
base_height = BASE_HEIGHT_REF
dof_pos = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful if we have the autonormalization? My current policy just depends on the autonorm and its kinda nice not to have to worry about the scaling when sim2sim-ing

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try it. My "feeling" from Jenny's project is, absolutely yes. But it might also be that it's very important for the action scale (which was shared with this), but not for observations. That would make sense to me.

"""
Returns the normalized version of the input.
"""
def forward(self, input):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we tested how small these variances can get? I had some issues with sim2sim with normalizations with very small variances.

For safety in sim2sim I had clamped the variance to be no smaller than 0.1 (as per usual I was debugging 90 things at once so maybe this wasn't that important but still seems like a good idea for safety)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not. I'll put it on the stack

critic_hidden_dims = [256, 256, 128]
# * can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"
activation = "tanh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spicy


def _init_buffers(self):
super()._init_buffers()
self.oscillators = torch.zeros(self.num_envs, 2, device=self.device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So theres a sin and cos for each leg?


class MITHumanoidRunnerCfg(LeggedRobotRunnerCfg):
seed = -1
runner_class_name = "OnPolicyRunner"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right runner now? Whats my_runner?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on this branch no difference, on a different one I added the PBRS into my runner, but left the base on "vanilla". Basically "My Runner" is where you play with whatever you want.

class env(LeggedRobotCfg.env):
num_envs = 4096
num_observations = 49 + 3 * 18 # 121
num_actuators = 18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now the learnt urdf only has 10 joints so this should crash I think? But I think we should start using the arms now so we should turn those joints back on in the urdf

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arms are enabled again.

sheim added 4 commits March 23, 2024 12:29
diff --git a/gym/envs/mit_humanoid/mit_humanoid_config.py b/gym/envs/mit_humanoid/mit_humanoid_config.py
index 54813f7..368f2c3 100644
--- a/gym/envs/mit_humanoid/mit_humanoid_config.py
+++ b/gym/envs/mit_humanoid/mit_humanoid_config.py
@@ -142,7 +142,7 @@ class MITHumanoidCfg(LeggedRobotCfg):
     class asset(LeggedRobotCfg.asset):
         file = (
             "{LEGGED_GYM_ROOT_DIR}/resources/robots/"
-            + "mit_humanoid/urdf/humanoid_R_sf.urdf"
+            + "mit_humanoid/urdf/humanoid_F_sf_learnt.urdf"
         )
         # foot_collisionbox_names = ["foot"]
         foot_name = "foot"
@@ -289,12 +289,12 @@ class MITHumanoidRunnerCfg(LeggedRobotRunnerCfg):
         use_clipped_value_loss = True
         clip_param = 0.2
         entropy_coef = 0.01
-        num_learning_epochs = 5
+        num_learning_epochs = 4
         # * mini batch size = num_envs*nsteps / nminibatches
         num_mini_batches = 4
-        learning_rate = 5.0e-5
+        learning_rate = 1.0e-6
         schedule = "adaptive"  # could be adaptive, fixed
-        gamma = 0.999
+        gamma = 0.99
         lam = 0.95
         desired_kl = 0.01
         max_grad_norm = 1.0
diff --git a/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf b/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf
index 5e76f34..98a0773 100644
--- a/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf
+++ b/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf
@@ -570,7 +570,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="15_left_shoulder_pitch"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0.01346 0.17608 0.24657"
       rpy="0 0 0" />
@@ -615,7 +615,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="16_left_shoulder_abad"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 .05760 0"
       rpy="0.0 0 0" />
@@ -668,7 +668,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="17_left_shoulder_yaw"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 0 -.10250"
       rpy="0.0 0 0" />
@@ -719,7 +719,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="18_left_elbow"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 0 -.15750"
       rpy="0 0 0.0" />
@@ -798,7 +798,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="11_right_shoulder_pitch"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0.01346 -0.17608 0.24657"
       rpy="0 0 0" />
@@ -843,7 +843,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="12_right_shoulder_abad"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 -.05760 0"
       rpy="0.0 0 0" />
@@ -896,7 +896,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="13_right_shoulder_yaw"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 0 -.10250"
       rpy="0.0 0 0" />
@@ -947,7 +947,7 @@ Simple Foot: foot approximated as single box-contact -->
   </link>
   <joint
     name="14_right_elbow"
-    type="fixed">
+    type="revolute">
     <origin
       xyz="0 0 -.15750"
       rpy="0 0 0.0" />
sheim added 30 commits August 6, 2024 12:05
put jacobian into MIT_humanoid base environment, rather than import; importing breaks when running --original_config
only for lander: needs to overload _check_terminations_and_timeouts to not actually reset
so that reset can be handled only by runner (without breaking backwards compatibility)

Also keeps a list of rewards, for finer reward integration, and can also do traj stats
Adjust some reward weights of action_rate (mini-cheetah) to work decently.
…ffer. Doesn't seem to make a difference in speed.
…n the runner.

Seems to make a tiny difference in speed.
More importantly, it removes the `eval` hack, which means we can profile individual reward functions.
THIS BREAKS PBRS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants