-
Notifications
You must be signed in to change notification settings - Fork 0
Basic Q learning training script #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mIXs222
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Please post if have a promising result on actual notebooks, then we can merge this in.
When we train on actual notebooks, it may be worth looking at parallel learning in both episodes + learners to save learning time.
experiments/pod.Dockerfile
Outdated
| # Copying over the simple notebook for basic training tests | ||
| COPY ./notebooks/simple.ipynb /pod/notebooks/simple.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this. Docker compose mount the pod directory so notebooks/simple.ipynb will be there
pod/model.py
Outdated
| self.history = [] | ||
|
|
||
| def plot_rewards(self): | ||
| # Can't plt.show when running on docker apparently, so printing them out to plot on other machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to visualize, you can 1) plot and save fig, or 2) dump to csv/json and plot later
pod/train.py
Outdated
| from pod.bench import Notebooks, NotebookExecutor, BenchArgs | ||
| from pod.pickling import StaticPodPickling | ||
| from pod.storage import DictPodStorage | ||
| from model import QLearningPoddingModel | ||
| from pod.stats import ExpStat | ||
| from pod.feature import __FEATURE__ | ||
| from typing import List | ||
| import time | ||
| from pod.common import PodId | ||
| from loguru import logger | ||
| import gc | ||
| import random | ||
| import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make fmt
make lint
Reward function needs more thought.