-
Notifications
You must be signed in to change notification settings - Fork 0
Humanoid #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tested, 100% the same as PPO.
Same up to 1e-5 on individual values. Because of differencing, this can end up in much larger differences (1e0) occasionally, after normalization.
Left without boot-strapping on last step, seems like it makes no difference. Boot-strapping on time-outs made a big difference though.
Tensor dict
…d history length. Implemented for dof_pos_target, dof_pos, dof_vel.
in preparation for domain randomization of gains
…f_pos_history` (used for smoothness reward). Some tuning. Change stance/swing reward to use the smoothed square wave.
| class runner: | ||
| policy_class_name = "ActorCritic" | ||
| algorithm_class_name = "PPO" | ||
| algorithm_class_name = "PPO2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PPO2 splits up the critic update and the actor update, and is otherwise pretty much the same. This is also using the new tensorDict storage. You don't really need to look under the hood unless you want.
gym/envs/base/legged_robot.py
Outdated
| self._resample_commands(env_ids) | ||
| # * reset buffers | ||
| self.dof_pos_history[env_ids] = 0.0 | ||
| self.dof_pos_history[env_ids] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to reset to current state rather than 0, so the first 3 time-steps don't result in bogus time-differences.
| ) | ||
| self.p_gains = torch.zeros( | ||
| self.num_actuators, dtype=torch.float, device=self.device | ||
| self.num_envs, self.num_actuators, dtype=torch.float, device=self.device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is in prep to for randomizing gains
| self.num_actuators, | ||
| dtype=torch.float, | ||
| device=self.device, | ||
| self.num_envs, self.num_actuators, dtype=torch.float, device=self.device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore
| self.num_envs, 1, dtype=torch.float, device=self.device | ||
| ) | ||
|
|
||
| # # * get the body_name to body_index dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf is this? Pretty sure this was already there, not sure why it is popping up in the diff
|
|
||
| def _init_buffers(self): | ||
| super()._init_buffers() | ||
| self.oscillators = torch.zeros(self.num_envs, 2, device=self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an oscillator for each leg. They are initialized at pi phase-shift, so it's the same, then we don't have to manually shift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So theres a sin and cos for each leg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the observations, yes
| return rew_vel + rew_pos - rew_base_vel * self._switch("stand") | ||
|
|
||
| def _reward_stance(self): | ||
| # phase = torch.maximum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will zap later
| tracking_sigma = 0.5 | ||
|
|
||
| # a smooth switch based on |cmd| (commanded velocity). | ||
| switch_scale = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the switch function. Used this in Jenny's project to smoothly turn on/off rewards that we only want on or off when standing still or moving. switch_scale controls how quickly in moves from 0 to 1 (plot the function to see), and switch_threshold is when it starts to transition, based on |command|.
| critic_hidden_dims = [256, 256, 128] | ||
| # * can be elu, relu, selu, crelu, lrelu, tanh, sigmoid | ||
| activation = "elu" | ||
| activation = "tanh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a sweep the other day, tanh had high rewards 8 times, compared to 4 for elu... very hand wavy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spicy
elijahstangerjones
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for porting this steve
| base_lin_vel = 1.5 | ||
| commands = 1 | ||
| base_height = BASE_HEIGHT_REF | ||
| dof_pos = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this useful if we have the autonormalization? My current policy just depends on the autonorm and its kinda nice not to have to worry about the scaling when sim2sim-ing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try it. My "feeling" from Jenny's project is, absolutely yes. But it might also be that it's very important for the action scale (which was shared with this), but not for observations. That would make sense to me.
| """ | ||
| Returns the normalized version of the input. | ||
| """ | ||
| def forward(self, input): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we tested how small these variances can get? I had some issues with sim2sim with normalizations with very small variances.
For safety in sim2sim I had clamped the variance to be no smaller than 0.1 (as per usual I was debugging 90 things at once so maybe this wasn't that important but still seems like a good idea for safety)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not. I'll put it on the stack
| critic_hidden_dims = [256, 256, 128] | ||
| # * can be elu, relu, selu, crelu, lrelu, tanh, sigmoid | ||
| activation = "elu" | ||
| activation = "tanh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spicy
|
|
||
| def _init_buffers(self): | ||
| super()._init_buffers() | ||
| self.oscillators = torch.zeros(self.num_envs, 2, device=self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So theres a sin and cos for each leg?
|
|
||
| class MITHumanoidRunnerCfg(LeggedRobotRunnerCfg): | ||
| seed = -1 | ||
| runner_class_name = "OnPolicyRunner" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right runner now? Whats my_runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on this branch no difference, on a different one I added the PBRS into my runner, but left the base on "vanilla". Basically "My Runner" is where you play with whatever you want.
| class env(LeggedRobotCfg.env): | ||
| num_envs = 4096 | ||
| num_observations = 49 + 3 * 18 # 121 | ||
| num_actuators = 18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think right now the learnt urdf only has 10 joints so this should crash I think? But I think we should start using the arms now so we should turn those joints back on in the urdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arms are enabled again.
diff --git a/gym/envs/mit_humanoid/mit_humanoid_config.py b/gym/envs/mit_humanoid/mit_humanoid_config.py index 54813f7..368f2c3 100644 --- a/gym/envs/mit_humanoid/mit_humanoid_config.py +++ b/gym/envs/mit_humanoid/mit_humanoid_config.py @@ -142,7 +142,7 @@ class MITHumanoidCfg(LeggedRobotCfg): class asset(LeggedRobotCfg.asset): file = ( "{LEGGED_GYM_ROOT_DIR}/resources/robots/" - + "mit_humanoid/urdf/humanoid_R_sf.urdf" + + "mit_humanoid/urdf/humanoid_F_sf_learnt.urdf" ) # foot_collisionbox_names = ["foot"] foot_name = "foot" @@ -289,12 +289,12 @@ class MITHumanoidRunnerCfg(LeggedRobotRunnerCfg): use_clipped_value_loss = True clip_param = 0.2 entropy_coef = 0.01 - num_learning_epochs = 5 + num_learning_epochs = 4 # * mini batch size = num_envs*nsteps / nminibatches num_mini_batches = 4 - learning_rate = 5.0e-5 + learning_rate = 1.0e-6 schedule = "adaptive" # could be adaptive, fixed - gamma = 0.999 + gamma = 0.99 lam = 0.95 desired_kl = 0.01 max_grad_norm = 1.0 diff --git a/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf b/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf index 5e76f34..98a0773 100644 --- a/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf +++ b/resources/robots/mit_humanoid/urdf/humanoid_F_sf_learnt.urdf @@ -570,7 +570,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="15_left_shoulder_pitch" - type="fixed"> + type="revolute"> <origin xyz="0.01346 0.17608 0.24657" rpy="0 0 0" /> @@ -615,7 +615,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="16_left_shoulder_abad" - type="fixed"> + type="revolute"> <origin xyz="0 .05760 0" rpy="0.0 0 0" /> @@ -668,7 +668,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="17_left_shoulder_yaw" - type="fixed"> + type="revolute"> <origin xyz="0 0 -.10250" rpy="0.0 0 0" /> @@ -719,7 +719,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="18_left_elbow" - type="fixed"> + type="revolute"> <origin xyz="0 0 -.15750" rpy="0 0 0.0" /> @@ -798,7 +798,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="11_right_shoulder_pitch" - type="fixed"> + type="revolute"> <origin xyz="0.01346 -0.17608 0.24657" rpy="0 0 0" /> @@ -843,7 +843,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="12_right_shoulder_abad" - type="fixed"> + type="revolute"> <origin xyz="0 -.05760 0" rpy="0.0 0 0" /> @@ -896,7 +896,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="13_right_shoulder_yaw" - type="fixed"> + type="revolute"> <origin xyz="0 0 -.10250" rpy="0.0 0 0" /> @@ -947,7 +947,7 @@ Simple Foot: foot approximated as single box-contact --> </link> <joint name="14_right_elbow" - type="fixed"> + type="revolute"> <origin xyz="0 0 -.15750" rpy="0 0 0.0" />
…erically unstable for humanoid
put jacobian into MIT_humanoid base environment, rather than import; importing breaks when running --original_config
only for lander: needs to overload _check_terminations_and_timeouts to not actually reset so that reset can be handled only by runner (without breaking backwards compatibility) Also keeps a list of rewards, for finer reward integration, and can also do traj stats
Adjust some reward weights of action_rate (mini-cheetah) to work decently.
…ffer. Doesn't seem to make a difference in speed.
…n the runner. Seems to make a tiny difference in speed. More importantly, it removes the `eval` hack, which means we can profile individual reward functions. THIS BREAKS PBRS
Run
See file
mit_humanoid.py.Checklist before requesting a review
ruff format .manually