-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
TabularMDP constructs a transition_matrix and reward_matrix without considering is_terminal(s). This means that it is up to whoever is using those matrices to ensure that terminal states are being handled correctly. An alternative is for the matrices to be constructed such that terminal states always return a reward of 0 and self-loop. I lean towards keeping the original version, but we should be explicit about this.
see 3513789
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels