-
Notifications
You must be signed in to change notification settings - Fork 8
Data sampling of BPTT #6
Copy link
Copy link
Open
Description
Hi Sebastian,
I am working on a project implementing BPTT. I see in your implementation that the states used for policy updates are sampled from the replay buffer. According to the RL objective J(\theta)=E_{s~initial_dist}[V(s)], shouldn't we sample states from the initial distribution?
Thanks for your wonderful code!
Shenao
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels