GePPO #23

lukasmolnar · 2024-06-19T17:50:21Z

Issue ticket number and link

Fixes # (issue)

Describe your changes

Please include a summary of the change, including why you did this, and the desired effect.

Instructions for reviewers

Indicate anything in particular that you would like a code-reviewer to pay particular attention to.
Indicate steps to actually test code, including CLI instructions if different than usual.
Point out the desired behavior, and not just the "check that this appears" (otherwise the code reviewer will be lazy and just verify what you've already verified).

Checklist before requesting a review

This is expected to break regression tests.
I have assigned a reviewer
I have added the PR to the project, and tagged with with priority
If it is a core feature, I have added tests.
I have set up pre-commit hooks with ruff, or run ruff format . manually

learning/algorithms/geppo.py

sheim

Overall looks good, I haven't checked in detail the GePPO implementation, is there something specific you want me to look at closely?

sheim · 2024-07-02T20:15:29Z

learning/modules/actor.py

        activation="elu",
        init_noise_std=1.0,
        normalize_obs=True,
+        store_pik=False,


use readable variable names. What is "pik"?

I think this would be easy enough to create a new actor class that inherits from the vanilla actor, what do you think?

sheim · 2024-07-02T21:14:25Z

learning/modules/utils/normalize.py

-            mean = input.mean(tuple(range(input.dim() - 1)))
-            var = input.var(tuple(range(input.dim() - 1)))
+            # TODO: check this, it got rid of NaN values in first iteration
+            dim = tuple(range(input.dim() - 1))


simpler, use torch.nan_to_num()

sheim · 2024-07-02T21:17:31Z

learning/utils/dict_utils.py


+# Implementation based on GePPO repo: https://github.com/jqueeney/geppo
+@torch.no_grad
+def compute_gae_vtrace(data, gamma, lam, is_trunc, actor, critic, rec=False):


rule of thumb, don't abbreviate (rec --> recursive)

sheim · 2024-07-02T21:36:34Z

learning/algorithms/geppo.py

+            offpol_ratio = torch.exp(log_prob_pik - batch["log_prob"])
+
+            advantages = batch["advantages"]
+            if self.normalize_advantages:


this is currently set to False, which surprises me, that seemed to be quite important in PPO...

sheim · 2024-07-02T21:37:19Z

learning/algorithms/geppo.py

+            counter += 1
+        self.mean_surrogate_loss /= counter
+
+        # Compute TV, add to self for logging


what is TV?

lukasmolnar added 6 commits June 19, 2024 13:48

Integrated GAE with Vtrace into PPO2

617742b

new GePPO class and log advantages, returns

0ad0e48

added HybridPolicyRunner and GePPO actor update

ded68ef

bugfixes in V-trace GAE, pendulum learns now

6133f3b

adapt LR based on GePPO paper

08b2b77

GePPO for mini cheetah ref, and constant eps_geppo param

6845f0f

lukasmolnar commented Jun 24, 2024

View reviewed changes

learning/algorithms/geppo.py Show resolved Hide resolved

lukasmolnar and others added 4 commits June 25, 2024 11:45

added recursive GAE vtrace, and split GAE by policy

ed751e3

fine tune by loadining PPO run and training with noise multiplied

0329ae5

handle value size at source

b4e94f6

fixed "handle one env" at the source in criitic

8f0680b

sheim reviewed Jul 2, 2024

View reviewed changes

Base automatically changed from SAC to dev July 11, 2024 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GePPO #23

GePPO #23

Uh oh!

lukasmolnar commented Jun 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

sheim left a comment

Uh oh!

sheim Jul 2, 2024

Uh oh!

sheim Jul 2, 2024

Uh oh!

sheim Jul 2, 2024

Uh oh!

sheim Jul 2, 2024

Uh oh!

sheim Jul 2, 2024

Uh oh!

sheim Jul 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GePPO #23

Are you sure you want to change the base?

GePPO #23

Uh oh!

Conversation

lukasmolnar commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue ticket number and link

Describe your changes

Instructions for reviewers

Checklist before requesting a review

Uh oh!

Uh oh!

sheim left a comment

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

sheim Jul 2, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lukasmolnar commented Jun 19, 2024 •

edited

Loading