Skip to content

Conversation

@lukasmolnar
Copy link
Contributor

@lukasmolnar lukasmolnar commented Jul 12, 2024

Issue ticket number and link

Fixes # (issue)

Describe your changes

Semester thesis project dev branch. So far:

  • Smooth exploration (gSDE): Uses SmoothActor class and resamples noise matrices every n steps (default 16).
  • Interpolated Policy Gradient (IPG): On- and off-policy buffers. V, Q and Q-target critics. Losses for on- and off-policy that are interpolated with inter_nu. Control variate is not implemented yet, however in the paper the results are also good without it so I am postponing implementing this.
  • Finetuning using Robot-Software logs: Collect LCM data, do gradient step update on this batch. Custom class MinimalistCheetah for computing rewards. NOTE: Rewards are very noisy, especially lin vel (added plotting so analyze rewards)

Instructions for reviewers

Indicate anything in particular that you would like a code-reviewer to pay particular attention to.
Indicate steps to actually test code, including CLI instructions if different than usual.
Point out the desired behavior, and not just the "check that this appears" (otherwise the code reviewer will be lazy and just verify what you've already verified).

Checklist before requesting a review

  • This is expected to break regression tests.
  • I have assigned a reviewer
  • I have added the PR to the project, and tagged with with priority
  • If it is a core feature, I have added tests.
  • I have set up pre-commit hooks with ruff, or run ruff format . manually

lukasmolnar and others added 30 commits February 28, 2024 13:30
run 200 iterations before starting training, to burn in normalization
@lukasmolnar
Copy link
Contributor Author

Note: This branch was checked out from lm/smooth-exploration, but some functionalities like plotting and Fourier analysis were removed so only the essentials are combined here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants