-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
bugSomething isn't workingSomething isn't working
Description
policy_loss in SAC_CQL is significantly higher than the official implementation when tested with hopper-expert-v0 in d4rl.
Line 261 in af3f511
| policy_loss = ((self.alpha * log_pi) - min_qf_pi).mean() |
With the author's implementation, we can get the loss lower than -350, while using accel we can't even reach -300, which leads to slower and unstable learning.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working