Developed with help from Andrej Karpathy's "Deep Reinforcement Learning: Pong from Pixels" and mrhatz's implentation of a policy gradient with pong with tensorflow
My first goal was to get it to work, which it does now. My next goal is to refactor it and try to tweak it, maybe use other optimizers and so on.