Conversation
|
Thanks for putting this together! Before merging, I’d like to run this through a few tests to make sure everything behaves as expected. One thought for the longer term: it might be slightly cleaner to expose adamw as a first-class optimizer option rather than modifying Adam via a flag. In the meantime, if anyone wants to experiment with AdamW behavior right away - you can add a local drop-in patch (for example pyperch_adamw_patch.py) in your working directory: Then import before constructing the Trainer: import pyperch_adamw_patch I’ll circle back once I’ve had a chance to test this more thoroughly. Thanks again for the contribution! |
That's fair. As I have time over the next few days I'll revisit the implementation of this.
Thanks for offering this! |
Add an AdamW optimizer with the same options as Adam.
|
I've modified the PR to expose |
See https://docs.pytorch.org/docs/stable/generated/torch.optim.AdamW.html.