I've managed to get some results on the MDP environment, but they are not as good as the paper and have not been pushed to the repo yet. I had to take a hiatus recently to move my apartment and start a new semester at McGill.
I'd like to build an automated system for testing network architectures and recording the cumulative regret over the course of 12000 episodes
- Abstract h-DQN into separate file from
run.py - Modify h-DQN to accept network architecture hyperparameters
- Create
search_architectures.py: randomly select network architecture, test it, and append results to data file - Create
plot_architectures.Rmd: knitr document for evaluating the network architectures that have been tested - Settle on satisfactory network architecture for h-DQN default values
- Use default values to update
README.mdwith results and discussion - Move on to Montezuma's revenge
- This was easy to do. Only took a few minutes.
- This also went smoothly.
Create search_architectures.py: randomly select network architecture, test it, and append results to data file
- Was tedious but easy to create the ability to randomly select network architectures
- Didn't think about the fact that I'm not currently recording cumulative regret
- Needed to implement this feature as part of this task
- Also realized there were a bunch of other parameters worth searching through
- Pulled these out so that
search_architectures.pycan pass them in as params
- Pulled these out so that
- There seems to be some sort of bug that occurs when some architectures are randomly selected
- Finished debugging/fixing this issue. Needed to add a
np.expand_dims()call whenever the number of experience samples was set to 1
- Finished debugging/fixing this issue. Needed to add a
- Now outputing results to data file