Maybe extend scope to include simple RL algorithms (I think staying away from gradient based methods might be a good idea for simplicity). ### Possible Algorithms: - [ ] - Value & Policy Iteration - [ ] - MC on-policy - [ ] - n-step TD ...