The codebase implements a starter agent that can solve a number of universe environments. Original website: https://github.com/openai/universe-starter-agent
It contains a basic implementation of the A3C algorithm, adapted for real-time environments.
- Python 2.7 or 3.5
- Golang
- six (for py2/3 compatibility)
- TensorFlow 0.12
- tmux (the start script opens up a tmux session with multiple windows)
- htop (shown in one of the tmux windows)
- gym
- libjpeg-turbo (
brew install libjpeg-turbo) - universe
- opencv-python
- numpy
- scipy
python train.py --num-workers 2 --env-id flashgames.NeonRace-v0 --log-dir /tmp/neonrace
The command above will train an agent
It will see two workers that will be learning in parallel (--num-workers flag) and will output intermediate results into given directory.
The code will launch the following processes:
- worker-0 - a process that runs policy gradient
- worker-1 - a process identical to process-1, that uses different random noise from the environment
- ps - the parameter server, which synchronizes the parameters among the different workers
- tb - a tensorboard process for convenient display of the statistics of learning
Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console.
Once in the tmux session, you can see all your windows with ctrl-b w.
To switch to window number 0, type: ctrl-b 0. Look up tmux documentation for more commands.
To access TensorBoard to see various monitoring metrics of the agent, open http://localhost:12345/ in a browser.
Add '--visualise' toggle if you want to visualise the worker using env.render() as follows:
python train.py --num-workers 2 --env-id flashgames.NeonRace-v0 --log-dir /tmp/neonrace --visualise