Skip to content

Conversation

@lukasmolnar
Copy link
Contributor

@lukasmolnar lukasmolnar commented Jun 19, 2024

Issue ticket number and link

Fixes # (issue)

Describe your changes

Please include a summary of the change, including why you did this, and the desired effect.

Instructions for reviewers

Indicate anything in particular that you would like a code-reviewer to pay particular attention to.
Indicate steps to actually test code, including CLI instructions if different than usual.
Point out the desired behavior, and not just the "check that this appears" (otherwise the code reviewer will be lazy and just verify what you've already verified).

Checklist before requesting a review

  • This is expected to break regression tests.
  • I have assigned a reviewer
  • I have added the PR to the project, and tagged with with priority
  • If it is a core feature, I have added tests.
  • I have set up pre-commit hooks with ruff, or run ruff format . manually

Copy link
Owner

@sheim sheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I haven't checked in detail the GePPO implementation, is there something specific you want me to look at closely?

activation="elu",
init_noise_std=1.0,
normalize_obs=True,
store_pik=False,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use readable variable names. What is "pik"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easy enough to create a new actor class that inherits from the vanilla actor, what do you think?

mean = input.mean(tuple(range(input.dim() - 1)))
var = input.var(tuple(range(input.dim() - 1)))
# TODO: check this, it got rid of NaN values in first iteration
dim = tuple(range(input.dim() - 1))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler, use torch.nan_to_num()


# Implementation based on GePPO repo: https://github.com/jqueeney/geppo
@torch.no_grad
def compute_gae_vtrace(data, gamma, lam, is_trunc, actor, critic, rec=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule of thumb, don't abbreviate (rec --> recursive)

offpol_ratio = torch.exp(log_prob_pik - batch["log_prob"])

advantages = batch["advantages"]
if self.normalize_advantages:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is currently set to False, which surprises me, that seemed to be quite important in PPO...

counter += 1
self.mean_surrogate_loss /= counter

# Compute TV, add to self for logging
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is TV?

Base automatically changed from SAC to dev July 11, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants