Skip to content

improve MC returns accuracy, avoid for loop #18

@sheim

Description

@sheim

compute_MC_returns currently loops in reverse to compute the returns and discount each step.
Comparing this with discount gamma=1.0 and and just taking data["rewards"].sum(dim=0), there is a discrepancy of

(data["rewards"].sum(dim=0)-compute_MC_returns(data, 1.0, test_critic)[0, :]).abs().max()
out: tensor(1.9073e-06, device='cuda:0')

so not very big, but still there.

Describe the solution you'd like
Pre-compute the discounting vector, and multiply then call .sum().

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions