-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
compute_MC_returns currently loops in reverse to compute the returns and discount each step.
Comparing this with discount gamma=1.0 and and just taking data["rewards"].sum(dim=0), there is a discrepancy of
(data["rewards"].sum(dim=0)-compute_MC_returns(data, 1.0, test_critic)[0, :]).abs().max()
out: tensor(1.9073e-06, device='cuda:0')
so not very big, but still there.
Describe the solution you'd like
Pre-compute the discounting vector, and multiply then call .sum().
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers