Skip to content

High Policy Loss in SAC_CQL #19

@waffoo

Description

@waffoo

policy_loss in SAC_CQL is significantly higher than the official implementation when tested with hopper-expert-v0 in d4rl.

policy_loss = ((self.alpha * log_pi) - min_qf_pi).mean()

With the author's implementation, we can get the loss lower than -350, while using accel we can't even reach -300, which leads to slower and unstable learning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions