Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
e19ec21
add new PPO2, for now just split up actor and critic.
sheim Feb 16, 2024
1a7977e
PPO2: split optimizers for critic and actor
sheim Feb 16, 2024
3253a65
cleaning
sheim Feb 20, 2024
e3c2589
Basic internals of dict_storage in place, fills correctly
sheim Feb 20, 2024
7c1166b
WIP: change storage to be (n_steps, n_envs, var_dim), among other things
sheim Feb 20, 2024
fa05a31
implemented compute advantages (returns), validated against existing.
sheim Feb 21, 2024
5c5d38b
critic update runs, but is very slow. Not clear why
sheim Feb 21, 2024
afa0be2
Learning critic with dict storage now working.
sheim Feb 22, 2024
12661f0
add tensordict to reqs
sheim Feb 25, 2024
679a729
WIP: works now, but not with normalization turned on
sheim Feb 26, 2024
6dc86fd
fix normalization, make it work for (n, m, var_dim) shaped data
sheim Feb 26, 2024
9fc831e
some cleanup and fixes on normalization
sheim Feb 26, 2024
9524c81
Merge branch 'dev' into tensorDict
sheim Feb 26, 2024
8129e1d
Merge branch 'dev' into tensorDict
sheim Mar 6, 2024
6c5aab4
Merge pull request #5 from mit-biomimetics/tensorDict
sheim Mar 13, 2024
6b9a2c9
add coupling to humanoid torques
sheim Mar 16, 2024
e52208f
add exp avg filtering
sheim Mar 16, 2024
54b826d
add oscillators and phase-based rewards, some tuning
sheim Mar 16, 2024
c9411cd
some tuning
sheim Mar 18, 2024
90ff50c
implement history with (approximated) arbitrary sampling frequency an…
sheim Mar 18, 2024
5020fd0
make pd gains specific for each robot
sheim Mar 19, 2024
82968b1
refactor sampled history to be more explicit and not overlap with `do…
sheim Mar 19, 2024
3ae04a0
properly roll histories, retune action_rate for mc
sheim Mar 23, 2024
ee05de4
enable arms in learnt urdf
sheim Mar 24, 2024
0aab100
hotfix old runner, and set as default for humanoid. New runner is num…
sheim Mar 24, 2024
de64f6c
WIP: more clean post actor/critic, exporting
sheim Mar 25, 2024
8dc7b40
Updating all envs and configs to have split actor and critic
sheim Mar 25, 2024
53e92bc
put critic loss_fn into critic module, remove value clipping
sheim Mar 25, 2024
06e62e3
several missing small fixes
sheim Mar 25, 2024
ac6cab1
initial commit for off-policy (but its still PPO2)
sheim Mar 25, 2024
df6a655
pendulum baseline
sheim Mar 25, 2024
0a20324
quick fix on storage config
sheim Mar 26, 2024
6fd4ba9
refactor GAE computation to be explicit
sheim Mar 26, 2024
890d562
set up off-policy, and SAC value function update.
sheim Mar 26, 2024
200a147
store next_obs explicitly, use for last_values
sheim Mar 26, 2024
96d5abb
refactor critic loss function to consume obs and self-evaluate
sheim Mar 26, 2024
bf05612
WIP: should be SAC now, with a lot of rough edges, and missing the ta…
sheim Mar 27, 2024
692502c
zap useless code
sheim Mar 27, 2024
5c7d898
WIP: SAC implementation (based off rsl_rl) runs
sheim Mar 28, 2024
c542214
update fixed_robot to match legged_robot refactor
sheim Mar 28, 2024
a5ca4fd
WIP: separate config for pendulum SAC, comments on missing components
sheim Mar 28, 2024
d86162e
WIP: tuning, critic loss higher than expected
sheim Mar 29, 2024
cf5cecd
WIP: partly fill replay buffer before training, bugfix on generator, …
sheim Apr 1, 2024
82e2179
switch back to elu, remove grad clipping
sheim Apr 1, 2024
7da0fab
match URDF, obs, ctrl freq, etc to gymansium
sheim Apr 4, 2024
1feafde
uniform sampling during initial fill, proper inference actions for play
sheim Apr 4, 2024
1969e0d
WIP: fix polyak bug, some more tuning
sheim Apr 5, 2024
b69f8c0
normalize advantages only after computing returns
sheim Apr 8, 2024
7db3f14
WIP: some tuning and fixes
sheim Apr 11, 2024
fe0a489
WIP: some more reasonable tuning for single env
sheim Apr 11, 2024
3808741
refactor chimera actor hidden layers
sheim Apr 12, 2024
5636bfc
quickfix of init_fill prinout
sheim Apr 12, 2024
7448aa6
update setup info
sheim Apr 18, 2024
cb51752
fix replay buffer
sheim Apr 18, 2024
eda4827
update configs to be closer to SB3
sheim Apr 18, 2024
67f75cc
derp, min instead of max -_- derp
sheim Apr 18, 2024
4eb0966
split environment config for SAC pendulum
sheim Apr 18, 2024
5ee7674
WIP: random tweaks and tuning before switching to other stuff
sheim Apr 22, 2024
e10c836
burn in normalization before learning
sheim Apr 26, 2024
acedd97
fix configs wrt to normalization, and fixed_base
sheim Apr 26, 2024
4f87735
Merge pull request #17 from mit-biomimetics/runner_burnin
sheim Apr 29, 2024
a79fae1
ppo2 works again on pendulum (lqr/testing config)
lukasmolnar May 31, 2024
3f81326
revert some pendulum changes, SAC runs now but doesn't learn
lukasmolnar May 31, 2024
9278807
sac pendulum converges somewhat (bugfixes + new rewards)
lukasmolnar Jun 5, 2024
a6d4adf
increase LR (all pendulums balance now)
lukasmolnar Jun 5, 2024
02611d0
increase LR and episode len
lukasmolnar Jun 6, 2024
dd9704e
added SAC mini cheetah and refactorings
lukasmolnar Jun 7, 2024
65eb253
PPO and SAC work on same rewards and config
lukasmolnar Jun 12, 2024
2ccfd8f
refactorings
lukasmolnar Jun 12, 2024
433b2f5
Merge branch 'dev' into SAC
sheim Jun 14, 2024
49ddec2
fixes
lukasmolnar Jun 14, 2024
f61f6b5
Merge remote-tracking branch 'origin/SAC' into lm/SAC_dev
lukasmolnar Jun 14, 2024
1721d06
Merge pull request #21 from mit-biomimetics/lm/SAC_dev
sheim Jun 14, 2024
34e0892
fix shape of pendulum reward
sheim Jul 9, 2024
948f2af
merge in smooth exploration, working, failed unit test
sheim Jul 10, 2024
ce7ab90
fix export vector size
sheim Jul 10, 2024
c4069ba
switch back export vector size for Robot-Software compabitibility
sheim Jul 10, 2024
9feaafb
encapsulate smooth exploration into SmoothActor
sheim Jul 10, 2024
273538e
remove deug for actor, and get_std logging
lukasmolnar Jul 10, 2024
710ee9b
default to white noise exploration
sheim Jul 11, 2024
816a18b
Merge branch 'dev' into humanoid
sheim Aug 6, 2024
098989a
fix hips_forward dimensionality
sheim Aug 6, 2024
0611bcd
lander env
sheim Aug 6, 2024
fd30865
unify humanoid and lander
sheim Aug 7, 2024
6248559
everything integrated and works
sheim Aug 27, 2024
1540cd2
fix faster training configs for benchmarking
sheim Aug 28, 2024
842b39e
just tweaks
sheim Aug 29, 2024
33efaf5
SAC: typo and params that work for pendulum
sheim Aug 29, 2024
fd06555
update ruff settings
sheim Aug 30, 2024
3550dfd
address deprecation in torch.load and add lr to trackables for SAC
sheim Aug 30, 2024
4247ea7
allow randomize_episode_counters for reset_to_uniform in pendulum
sheim Aug 30, 2024
66c9e1b
some slimmin and separation of utils
sheim Sep 3, 2024
dbd48c9
add layer norm to create_MLP, +kwargs flexibility
sheim Sep 3, 2024
7d97081
update for layernorm
sheim Sep 3, 2024
032ac8e
add layer norm to create_MLP, +kwargs flexibility
sheim Sep 3, 2024
f084aa1
Merge pull request #29 from mit-biomimetics/layer_norm
sheim Sep 3, 2024
be2e990
wip
sheim Sep 4, 2024
3cd76b3
Merge branch 'dev' into humanoid
sheim Sep 4, 2024
2799110
Refactor for action frequency handled in runner
sheim Sep 4, 2024
91395f4
update torch.load with weights_only=True for depercation
sheim Sep 5, 2024
0979fab
working SAC on pendulum, ready for merge
sheim Sep 6, 2024
54e7a8d
Merge branch 'psdNetworks' into dev
sheim Sep 8, 2024
12e8327
Merge branch 'dev' into humanoid
sheim Sep 8, 2024
f105c0e
update nn_params
sheim Sep 9, 2024
c04b563
apply stash w/ sim-freq reward sampling.
sheim Sep 9, 2024
e395673
refactor skeleton with a super_init fix, and pre-initialize reward bu…
sheim Sep 9, 2024
ea5cdff
refactor: redo rewards computation, with a dict of reward functions i…
sheim Sep 9, 2024
7296c6c
compute switch once per decimation (speedup ~10%)
sheim Sep 9, 2024
69218dd
fixed logging bug that scaled with wrong dt
sheim Sep 10, 2024
b1bd4f2
hardcode Jacobian (halves time for _apply_coupling)
sheim Sep 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,7 @@ ipython_config.py
venv/
env.bak/
venv.bak/

# Smooth exploration
gym/smooth_exploration/data*
gym/smooth_exploration/figures*
22 changes: 21 additions & 1 deletion gym/envs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,28 +19,40 @@
"Anymal": ".anymal_c.anymal",
"A1": ".a1.a1",
"HumanoidRunning": ".mit_humanoid.humanoid_running",
"Pendulum": ".pendulum.pendulum",
"Lander": ".mit_humanoid.lander",
}

config_dict = {
"CartpoleCfg": ".cartpole.cartpole_config",
"MiniCheetahCfg": ".mini_cheetah.mini_cheetah_config",
"MiniCheetahRefCfg": ".mini_cheetah.mini_cheetah_ref_config",
"MiniCheetahOscCfg": ".mini_cheetah.mini_cheetah_osc_config",
"MiniCheetahSACCfg": ".mini_cheetah.mini_cheetah_SAC_config",
"MITHumanoidCfg": ".mit_humanoid.mit_humanoid_config",
"A1Cfg": ".a1.a1_config",
"AnymalCFlatCfg": ".anymal_c.flat.anymal_c_flat_config",
"HumanoidRunningCfg": ".mit_humanoid.humanoid_running_config",
"PendulumCfg": ".pendulum.pendulum_config",
"PendulumSACCfg": ".pendulum.pendulum_SAC_config",
"LanderCfg": ".mit_humanoid.lander_config",
"PendulumPSDCfg": ".pendulum.pendulum_PSD_config",
}

runner_config_dict = {
"CartpoleRunnerCfg": ".cartpole.cartpole_config",
"MiniCheetahRunnerCfg": ".mini_cheetah.mini_cheetah_config",
"MiniCheetahRefRunnerCfg": ".mini_cheetah.mini_cheetah_ref_config",
"MiniCheetahOscRunnerCfg": ".mini_cheetah.mini_cheetah_osc_config",
"MiniCheetahSACRunnerCfg": ".mini_cheetah.mini_cheetah_SAC_config",
"MITHumanoidRunnerCfg": ".mit_humanoid.mit_humanoid_config",
"A1RunnerCfg": ".a1.a1_config",
"AnymalCFlatRunnerCfg": ".anymal_c.flat.anymal_c_flat_config",
"HumanoidRunningRunnerCfg": ".mit_humanoid.humanoid_running_config",
"PendulumRunnerCfg": ".pendulum.pendulum_config",
"PendulumSACRunnerCfg": ".pendulum.pendulum_SAC_config",
"LanderRunnerCfg": ".mit_humanoid.lander_config",
"PendulumPSDRunnerCfg": ".pendulum.pendulum_PSD_config",
}

task_dict = {
Expand All @@ -56,14 +68,22 @@
"MiniCheetahOscCfg",
"MiniCheetahOscRunnerCfg",
],
"sac_mini_cheetah": [
"MiniCheetahRef",
"MiniCheetahSACCfg",
"MiniCheetahSACRunnerCfg"
],
"humanoid": ["MIT_Humanoid", "MITHumanoidCfg", "MITHumanoidRunnerCfg"],
"humanoid_running": [
"HumanoidRunning",
"HumanoidRunningCfg",
"HumanoidRunningRunnerCfg",
],
"a1": ["A1", "A1Cfg", "A1RunnerCfg"],
"flat_anymal_c": ["Anymal", "AnymalCFlatCfg", "AnymalCFlatRunnerCfg"],
"pendulum": ["Pendulum", "PendulumCfg", "PendulumRunnerCfg"],
"sac_pendulum": ["Pendulum", "PendulumSACCfg", "PendulumSACRunnerCfg"],
"lander": ["Lander", "LanderCfg", "LanderRunnerCfg"],
"psd_pendulum": ["Pendulum", "PendulumPSDCfg", "PendulumPSDRunnerCfg"],
}

for class_name, class_location in class_dict.items():
Expand Down
34 changes: 19 additions & 15 deletions gym/envs/a1/a1_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,13 @@ class scaling(LeggedRobotCfg.scaling):

class A1RunnerCfg(LeggedRobotRunnerCfg):
seed = -1
runner_class_name = "OldPolicyRunner"

class policy(LeggedRobotRunnerCfg.policy):
actor_hidden_dims = [256, 256, 256]
critic_hidden_dims = [256, 256, 256]
class actor(LeggedRobotRunnerCfg.actor):
hidden_dims = [256, 256, 256]
# activation can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"
actor_obs = [
obs = [
"base_height",
"base_lin_vel",
"base_ang_vel",
Expand All @@ -154,7 +154,19 @@ class policy(LeggedRobotRunnerCfg.policy):
"commands",
]

critic_obs = [
actions = ["dof_pos_target"]

class noise:
dof_pos_obs = 0.005
dof_vel = 0.005
base_ang_vel = 0.05
projected_gravity = 0.02

class critic:
hidden_dims = [256, 256, 256]
# activation can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"
obs = [
"base_height",
"base_lin_vel",
"base_ang_vel",
Expand All @@ -165,16 +177,8 @@ class policy(LeggedRobotRunnerCfg.policy):
"commands",
]

actions = ["dof_pos_target"]

class noise:
dof_pos_obs = 0.005 # can be made very low
dof_vel = 0.005
base_ang_vel = 0.05
projected_gravity = 0.02

class reward(LeggedRobotRunnerCfg.policy.reward):
class weights(LeggedRobotRunnerCfg.policy.reward.weights):
class reward:
class weights:
tracking_lin_vel = 1.0
tracking_ang_vel = 1.0
lin_vel_z = 0.0
Expand Down
35 changes: 20 additions & 15 deletions gym/envs/anymal_c/flat/anymal_c_flat_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ class push_robots:
toggle = True
interval_s = 1
max_push_vel_xy = 0.5
push_box_dims = [0.2, 0.2, 0.2]

class domain_rand(LeggedRobotCfg.domain_rand):
randomize_base_mass = True
Expand Down Expand Up @@ -120,14 +121,14 @@ class scaling(LeggedRobotCfg.scaling):

class AnymalCFlatRunnerCfg(LeggedRobotRunnerCfg):
seed = -1
runner_class_name = "OldPolicyRunner"

class policy(LeggedRobotRunnerCfg.policy):
actor_hidden_dims = [256, 256, 256]
critic_hidden_dims = [256, 256, 256]
class actor(LeggedRobotRunnerCfg.actor):
hidden_dims = [256, 256, 256]
# can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"

actor_obs = [
obs = [
"base_height",
"base_lin_vel",
"base_ang_vel",
Expand All @@ -138,7 +139,19 @@ class policy(LeggedRobotRunnerCfg.policy):
"dof_pos_history",
]

critic_obs = [
actions = ["dof_pos_target"]

class noise:
dof_pos_obs = 0.005
dof_vel = 0.005
base_ang_vel = 0.05 # 0.027, 0.14, 0.37
projected_gravity = 0.02

class critic(LeggedRobotRunnerCfg.critic):
hidden_dims = [256, 256, 256]
# can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"
obs = [
"base_height",
"base_lin_vel",
"base_ang_vel",
Expand All @@ -149,16 +162,8 @@ class policy(LeggedRobotRunnerCfg.policy):
"dof_pos_history",
]

actions = ["dof_pos_target"]

class noise:
dof_pos_obs = 0.005 # can be made very low
dof_vel = 0.005
base_ang_vel = 0.05 # 0.027, 0.14, 0.37
projected_gravity = 0.02

class reward(LeggedRobotRunnerCfg.policy.reward):
class weights(LeggedRobotRunnerCfg.policy.reward.weights):
class reward:
class weights:
tracking_lin_vel = 3.0
tracking_ang_vel = 1.0
lin_vel_z = 0.0
Expand Down
7 changes: 5 additions & 2 deletions gym/envs/anymal_c/mixed_terrains/anymal_c_rough_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,15 @@ class domain_rand(AnymalCFlatCfg.domain_rand):


class AnymalCRoughCCfgPPO(AnymalCFlatCfgPPO):
class policy(AnymalCFlatCfgPPO.policy):
actor_hidden_dims = [128, 64, 32]
class actor(AnymalCFlatCfgPPO.actor):
hidden_dims = [128, 64, 32]
critic_hidden_dims = [128, 64, 32]
# can be elu, relu, selu, crelu, lrelu, tanh, sigmoid
activation = "elu"

class critic(AnymalCFlatCfgPPO.critic):
pass

class algorithm(AnymalCFlatCfgPPO.algorithm):
entropy_coef = 0.01

Expand Down
19 changes: 3 additions & 16 deletions gym/envs/base/base_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,33 +18,20 @@ def __init__(self, gym, sim, cfg, sim_params, sim_device, headless):
# * env device is GPU only if sim is on GPU and use_gpu_pipeline=True,
# * otherwise returned tensors are copied to CPU by physX.
if sim_device_type == "cuda" and sim_params.use_gpu_pipeline:
self.device = self.sim_device
device = self.sim_device
else:
self.device = "cpu"
device = "cpu"

# * graphics device for rendering, -1 for no rendering
self.graphics_device_id = self.sim_device_id

self.num_envs = cfg.env.num_envs
self.num_actuators = cfg.env.num_actuators

# * optimization flags for pytorch JIT
torch._C._jit_set_profiling_mode(False)
torch._C._jit_set_profiling_executor(False)

# allocate buffers
self.to_be_reset = torch.ones(
self.num_envs, device=self.device, dtype=torch.bool
)
self.terminated = torch.ones(
self.num_envs, device=self.device, dtype=torch.bool
)
self.episode_length_buf = torch.zeros(
self.num_envs, device=self.device, dtype=torch.long
)
self.timed_out = torch.zeros(
self.num_envs, device=self.device, dtype=torch.bool
)
super().__init__(num_envs=cfg.env.num_envs, device=device)

# todo: read from config
self.enable_viewer_sync = True
Expand Down
Loading