improve MC returns accuracy, avoid for loop

`compute_MC_returns` currently loops in reverse to compute the returns and discount each step.
Comparing this with discount `gamma=1.0` and and just taking `data["rewards"].sum(dim=0)`, there is a discrepancy of

```
(data["rewards"].sum(dim=0)-compute_MC_returns(data, 1.0, test_critic)[0, :]).abs().max()
out: tensor(1.9073e-06, device='cuda:0')
```

so not very big, but still there.

**Describe the solution you'd like**
Pre-compute the discounting vector, and multiply then call .sum().


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve MC returns accuracy, avoid for loop #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

improve MC returns accuracy, avoid for loop #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions