RL?

Maybe extend scope to include simple RL algorithms (I think staying away from gradient based methods might be a good idea for simplicity). 

### Possible Algorithms:
- [ ] - Value & Policy Iteration
- [ ] - MC on-policy
- [ ] - n-step TD ...