Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
360afc6
Added VAE paper to the bibliography.
MillionIntegrals Apr 8, 2019
5a3d2b6
Reordered MNIST file.
MillionIntegrals Apr 8, 2019
c156b69
Add default varargs to be empty.
MillionIntegrals Apr 8, 2019
5d9f112
Easy script model config.
MillionIntegrals Apr 8, 2019
96bb187
train_data has been renamed to data.
MillionIntegrals Apr 8, 2019
7a44a8e
Added matplotlib dependency.
MillionIntegrals Apr 8, 2019
6b5ac6d
Fixed a typo.
MillionIntegrals Apr 8, 2019
82148e0
Unsupervised MNIST dataset.
MillionIntegrals Apr 8, 2019
24a2556
Mnist Autoencoder.
MillionIntegrals Apr 8, 2019
ed88af1
Loading pretrained models.
MillionIntegrals Apr 8, 2019
e52df95
Better optimizer for the task.
MillionIntegrals Apr 8, 2019
d3d1fe7
Turned methods of Source into properties.
MillionIntegrals Apr 8, 2019
17a5531
Fixing model summary.
MillionIntegrals Apr 8, 2019
89c110f
Some more changes to MNIST autoencoder.
MillionIntegrals Apr 8, 2019
95f191c
Better weight reset for autoencoder.
MillionIntegrals Apr 8, 2019
792064a
Small code changes in MNIST autoencoder.
MillionIntegrals Apr 8, 2019
3cd40b3
Fixing a bug in MNIST autoencoder.
MillionIntegrals Apr 8, 2019
123fc5f
Ignoring local notebooks for now.
MillionIntegrals Apr 9, 2019
846fd4d
Reducing number of parameters.
MillionIntegrals Apr 9, 2019
f628300
Implemented Variational autoencoder.
MillionIntegrals Apr 9, 2019
3e5069c
Adding example notebooks.
MillionIntegrals Apr 9, 2019
eebb84c
Small README updates.
MillionIntegrals Apr 9, 2019
9c26921
Small autoencoder changes.
MillionIntegrals Apr 10, 2019
053bc3c
Download data.
MillionIntegrals Apr 15, 2019
457724c
Additional parameter to the IMDB data set.
MillionIntegrals Apr 18, 2019
d9619db
New "script" model config.
MillionIntegrals Apr 18, 2019
7734748
Small formatting change.
MillionIntegrals May 4, 2019
01762d7
Change initial memory size hint for parallel envs.
MillionIntegrals May 4, 2019
ba97525
Small profiling utility.
MillionIntegrals May 4, 2019
ff6f814
Machine translation datasets.
MillionIntegrals May 4, 2019
dee86ab
Bumped up version and implemented better tracking of requirements.
MillionIntegrals May 16, 2019
d815013
Merge branch 'master' into candidate-v0.4
MillionIntegrals May 16, 2019
6f4f748
Fix makefile.
MillionIntegrals May 16, 2019
9f31409
Upgrade cloudpickle.
MillionIntegrals May 16, 2019
fe2d26b
Fixing a name bug in stochastic_policy_rnn_model.py(#50)
MillionIntegrals Jun 8, 2019
38e1231
Redid formatting of image_ops.
MillionIntegrals Jun 10, 2019
e409b48
Rename comment.
MillionIntegrals Jun 13, 2019
2238dce
New dependencies.
MillionIntegrals Jun 13, 2019
950f41c
Large rename of recurrent models.
MillionIntegrals Jun 13, 2019
8c0428e
Large scale rename and move.
MillionIntegrals Jun 13, 2019
31c5b05
Second stage of large renames.
MillionIntegrals Jun 13, 2019
3dcd0f2
Fixed linter issues.
MillionIntegrals Jun 13, 2019
2e9d926
Fixed tests after the refactoring.
MillionIntegrals Jun 13, 2019
9150db1
Renaming models to policies.
MillionIntegrals Jun 16, 2019
1b8251a
Fixing configuration files.
MillionIntegrals Jun 16, 2019
30e654e
Added a few extra input modules.
MillionIntegrals Jun 21, 2019
54352e9
Remove blank line.
MillionIntegrals Jun 21, 2019
99106a6
Fixed replay env rollers a bit.
MillionIntegrals Jun 21, 2019
772f6bc
Fixed integration tests a bit for the time being.
MillionIntegrals Jun 21, 2019
fe2443e
Implemented some useful new backbones.
MillionIntegrals Jun 21, 2019
ecd5c85
Added potential output directory override.
MillionIntegrals Jun 21, 2019
6de2f6c
Updated requirements.
MillionIntegrals Jun 22, 2019
6d6679f
New version of some dependencies.
MillionIntegrals Jun 22, 2019
5c04854
Large refactoring - work in progress.
MillionIntegrals Jun 24, 2019
5120966
Fixed again AE and VAE models.
MillionIntegrals Jun 24, 2019
5e07fd9
Small updates to README.
MillionIntegrals Jun 24, 2019
acdce01
Updating some metrics.
MillionIntegrals Jun 24, 2019
489a7b6
Added scope for some training metrics.
MillionIntegrals Jun 24, 2019
9f59f09
Added some comment docstring.
MillionIntegrals Jun 24, 2019
c146b32
Updated CIFAR10 configs.
MillionIntegrals Jun 24, 2019
0d44452
A bit more work on unifying metrics. Adding samples/sec metric.
MillionIntegrals Jun 24, 2019
ea25504
Fixing augmentations of cats vs dogs training.
MillionIntegrals Jun 25, 2019
5061e91
Fixing lr find command.
MillionIntegrals Jun 25, 2019
2e4ede0
Fixing cats vs dogs transfer learning example.
MillionIntegrals Jun 26, 2019
dd3adca
Working on a loader for text.
MillionIntegrals Jun 26, 2019
055c007
Finished fixing shakespeare text generation.
MillionIntegrals Jun 26, 2019
634fe77
Fixed a few broken imports.
MillionIntegrals Jun 27, 2019
8c7aae1
Updated dependency in makefile.
MillionIntegrals Sep 12, 2019
eb30299
Updated to PyTorch 1.2
MillionIntegrals Sep 12, 2019
6c8cd48
Removing parallel pytest.
MillionIntegrals Sep 12, 2019
57420ce
Worked on fixing unit tests after the refactors.
MillionIntegrals Sep 12, 2019
839b19e
Merge branch 'master' into candidate-v0.4
MillionIntegrals Sep 12, 2019
e072337
Added more publications on optimizers to bibliography.
MillionIntegrals Sep 13, 2019
54bd258
New optimizers: RAdam + Ranger
MillionIntegrals Sep 13, 2019
a075293
API to transform only a single coordinate.
MillionIntegrals Sep 13, 2019
abf7cc7
Iterating on the MNIST VAE.
MillionIntegrals Sep 13, 2019
6b46de8
Cleaned up README a bit.
MillionIntegrals Sep 13, 2019
4fa9ca4
Fixed a warning in VAE code.
MillionIntegrals Sep 13, 2019
b579864
Implement "convert warnings to errors" option.
MillionIntegrals Sep 13, 2019
feb2efb
Expanding bibliography.
MillionIntegrals Sep 15, 2019
cc53ae5
Added a few extra interactive options to dataflow.
MillionIntegrals Sep 15, 2019
67d9d4b
Proper NLL estimation using importance sampling.
MillionIntegrals Sep 15, 2019
b7f6667
Minor dependency update.
MillionIntegrals Sep 19, 2019
53628e4
Implemented fully connected MNIST VAE with results matching the IWAE …
MillionIntegrals Sep 20, 2019
00178dd
Implemented CNN-VAE for MNIST.
MillionIntegrals Sep 20, 2019
b61bbcd
Added omniglot dataset.
MillionIntegrals Sep 21, 2019
b393470
Added omniglot VAE configs.
MillionIntegrals Sep 21, 2019
509e2c7
Renamed cnn autoencode.
MillionIntegrals Sep 23, 2019
d0c56e8
Reorganized latent variable models.
MillionIntegrals Sep 23, 2019
b988946
Make Reshape more flexible.
MillionIntegrals Sep 23, 2019
e511462
Clean up VAE implementation.
MillionIntegrals Sep 23, 2019
28bc997
IWAE implementation.
MillionIntegrals Sep 23, 2019
6ffbb4c
IWAE implementation.
MillionIntegrals Sep 25, 2019
0b9d731
Added VQ-VAE repo to bibliograpgy.
MillionIntegrals Sep 25, 2019
5a7c298
Run tag support, and cleaned up training info.
MillionIntegrals Sep 25, 2019
118454f
Configurable Evaluator cache (#52)
galatolofederico Sep 25, 2019
09f14f9
Added a global list command for the models.
MillionIntegrals Sep 26, 2019
521912c
Canonical MIST-VAE notebook.
MillionIntegrals Sep 26, 2019
3c9a2ff
Significant refactoring of optimizers.
MillionIntegrals Sep 26, 2019
1383aab
Merge branch 'candidate-v0.4' of github.com:MillionIntegrals/vel into…
MillionIntegrals Sep 26, 2019
7e70b3d
Bring back RL algos.
MillionIntegrals Sep 27, 2019
b5a068e
PPO and A2C RNN policies.
MillionIntegrals Sep 27, 2019
62e82ff
Updated README.
MillionIntegrals Oct 2, 2019
43369d5
Continuing with major net/rl code refactoring.
MillionIntegrals Oct 2, 2019
bc211ec
Restored TRPO.
MillionIntegrals Oct 2, 2019
8a17a97
Restored `atari_a2c_tf_rmsprop` example.
MillionIntegrals Oct 2, 2019
1ccc0a4
ACER works again.
MillionIntegrals Oct 2, 2019
cbb38f3
Revived the DDQN config.
MillionIntegrals Oct 3, 2019
aa2905e
Revived the Rainbow.
MillionIntegrals Oct 3, 2019
936b2b9
A2C yaml works now.
MillionIntegrals Oct 3, 2019
c3b8c99
Revived MuJoCo A2C
MillionIntegrals Oct 3, 2019
0ce852f
Revied MuJoCo PPO.
MillionIntegrals Oct 3, 2019
3c47758
Sizing network input according to the environment size.
MillionIntegrals Oct 3, 2019
6c8a618
Disable integration tests for now.
MillionIntegrals Oct 3, 2019
c0a0128
Remove evaluator cache.
MillionIntegrals Oct 3, 2019
c6f6e72
Brought back DDPG.
MillionIntegrals Oct 3, 2019
d6d286a
Standardized model naming.
MillionIntegrals Oct 3, 2019
5d05e58
get_layer_groups -> layer_groups
MillionIntegrals Oct 3, 2019
bfba7cc
Brought back the RNN RL training.
MillionIntegrals Oct 3, 2019
8e32284
Clean up purgatory files.
MillionIntegrals Oct 3, 2019
aeec2ac
Commit basic version of VQ-VAE.
MillionIntegrals Oct 3, 2019
6934cb6
Implemented simple MNIST-GAN
MillionIntegrals Oct 4, 2019
9d35c4f
Requirements update.
MillionIntegrals Oct 4, 2019
23e0b21
Updated roadmap slightly.
MillionIntegrals Oct 6, 2019
f9f5942
Refactored RNN language modelling examples.
MillionIntegrals Oct 7, 2019
fa431e7
Update to requirements.txt
MillionIntegrals Oct 7, 2019
0d5e6b1
Deleting abandoned files.
MillionIntegrals Oct 7, 2019
dfaef13
Renaming, Network -> VModule.
MillionIntegrals Oct 7, 2019
0c3bd10
Renaming ModelFactory -> ModuleFactory.
MillionIntegrals Oct 7, 2019
5a7a3f1
Code lint update.
MillionIntegrals Oct 10, 2019
7fb9375
Update to PyTorch 1.3
MillionIntegrals Oct 11, 2019
9f36e73
Another refactoring of remaining examples.
MillionIntegrals Oct 14, 2019
f4e6107
Improving formatting.
MillionIntegrals Oct 14, 2019
a00d125
Formatting.
MillionIntegrals Oct 14, 2019
284260a
EWMA input normalization.
MillionIntegrals Oct 14, 2019
a2bb131
Implemented WANDB streaming.
MillionIntegrals Oct 19, 2019
b2b60d4
Added wandb settings to git ignore.
MillionIntegrals Oct 19, 2019
e3e211f
Added faster VAE NLL command.
MillionIntegrals Oct 19, 2019
8923844
Fixed issue in train command.
MillionIntegrals Oct 19, 2019
3eb8d2f
Remove omniglot VAE examples.
MillionIntegrals Oct 19, 2019
da8a3b7
Initial benchmarks.
MillionIntegrals Oct 20, 2019
a6acbba
Moved configs around.
MillionIntegrals Oct 20, 2019
d826300
Improving WANDB bindings.
MillionIntegrals Oct 20, 2019
7734186
Fixing a bug in metric key initialization.
MillionIntegrals Oct 20, 2019
883d95d
Misc changes to logging and stuff.
MillionIntegrals Oct 25, 2019
81e742a
Example how to run config from file.
MillionIntegrals Oct 31, 2019
73edf97
Reordering default velproject.
MillionIntegrals Nov 3, 2019
c21d608
Updated bibliography
MillionIntegrals Nov 3, 2019
2a8949a
Aggregate metrics from optimizers in multi-optimzier setup.
MillionIntegrals Nov 3, 2019
d09942f
Add image metrics to tensorboard and somehow stabilize the MNIST GAN.
MillionIntegrals Nov 4, 2019
bda0c39
Updated VAE configs.
MillionIntegrals Nov 9, 2019
cb11b38
VAE benchmarks.
MillionIntegrals Nov 9, 2019
b1524bb
Update to the WANDB stream.
MillionIntegrals Nov 9, 2019
4636bba
Fixed evaluation command.
MillionIntegrals Nov 9, 2019
37e1dea
Make ACER work again.
MillionIntegrals Nov 9, 2019
c852057
Fixing rainbow.
MillionIntegrals Nov 10, 2019
8277f60
Vel Research - Rubik's cube project
MillionIntegrals Nov 10, 2019
13f86fe
Dependencies update.
MillionIntegrals Nov 10, 2019
4e94cc5
Implemented skip-connection layer.
MillionIntegrals Nov 11, 2019
965903c
Fixing an error in TRPO config.
MillionIntegrals Nov 11, 2019
e2d7814
Improved summary command
MillionIntegrals Nov 14, 2019
ed4d804
Enhancements to the modular network code.
MillionIntegrals Nov 14, 2019
fc554a0
Fixing O-U noise
MillionIntegrals Nov 15, 2019
3e0dd6a
Tiny updates to EWMA normalization.
MillionIntegrals Nov 15, 2019
174c505
Registering env for the iteration.
MillionIntegrals Nov 15, 2019
4582c4e
Minor fixes to rl.command.record_movie_command and evaluate (#55)
matrig Feb 21, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
max-line-length = 120
exclude = vel/openai, test, vel/api/__init__.py, vel/rl/api/__init__.py, vel/data/__init__.py, vel/metric/__init__.py, vel/metric/base/__init__.py, vel/train/__init__.py, vel/optimizer/ranger.py, vel/optimizer/radam.py
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,6 @@ environment.yaml

# Test cache
/.pytest_cache

# WANDB settings
/wandb
23 changes: 19 additions & 4 deletions .velproject.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
project_name: 'vel'

storage:
name: vel.storage.classic

backend:
name: vel.storage.backend.mongodb
uri: 'mongodb://localhost:27017/'
database: deep_learning
name: vel.storage.backend.dummy

# Other potential setting
# name: vel.storage.backend.mongodb
# uri: 'mongodb://localhost:27017/'
# database: deep_learning

streaming:
- name: vel.storage.streaming.visdom
- name: vel.storage.streaming.tensorboard
- name: vel.storage.streaming.stdout
# - name: vel.storage.streaming.visdom
# - name: vel.storage.streaming.wandb


checkpoint_strategy:
Expand All @@ -20,3 +27,11 @@ visdom_settings:
server: 'http://localhost'
port: 8097


# List of commands that are shared among all models in this project
global_commands:
list:
name: vel.command.list_command
summary:
name: vel.command.summary_command

12 changes: 11 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.PHONY: default test partest requpgrade lint

default: test;

tag := $(shell git symbolic-ref -q --short HEAD)

docker-build:
Expand Down Expand Up @@ -30,4 +34,10 @@ serve-visdom:
python -m visdom.server

test:
pytest .
pytest .

requirements.txt: requirements.in
pip-compile --upgrade requirements.in

lint:
flake8 vel
220 changes: 86 additions & 134 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
# Vel 0.3
# Vel 0.4

[![Build Status](https://travis-ci.org/MillionIntegrals/vel.svg?branch=master)](https://travis-ci.org/MillionIntegrals/vel)
[![PyPI version](https://badge.fury.io/py/vel.svg)](https://badge.fury.io/py/vel)
[![GitHub](https://img.shields.io/github/license/mashape/apistatus.svg)](https://github.com/MillionIntegrals/vel/blob/master/LICENSE)
[![Gitter chat](https://badges.gitter.im/MillionIngegrals/vel.png)](https://gitter.im/deep-learning-vel)


Bring **velocity** to deep-learning research.


This project hosts a collection of **highly modular** deep learning components that are tested to be working well together.
A simple yaml-based system ties these modules together declaratively using configuration files,
but everything that can be defined using config files can be coded directly in the python script as well.
A simple yaml-based system ties these modules declaratively using configuration files.


This is still an early version and a hobby project so documentation is unfortunately nonexistent. I've tried to make the
code as clear as possible, and provide many usage examples, but whenever there was a tradeoff to be made between
simplicity and modularity I've chosen modularity first and simplicity second.
This is still an early version and a hobby project, so documentation is unfortunately nonexistent.
I've made an effort to make the code clear and provide many usage examples,
but whenever there was a tradeoff to be made between simplicity and modularity
I've chosen modularity first and simplicity second. Therefore, high emphasis is made on interfaces between components.


Having conducted a few research projects, I've gathered a small collection of repositories
Expand All @@ -40,6 +39,9 @@ If that's not the case few bits of custom glue code should do the job.
This repository is still in an early stage of that journey but it will grow
as I'll be putting work into it.

For up-to-date benchmarks, look here:
[Benchmarks](docs/Benchmarks.md)


# Blogposts

Expand All @@ -55,16 +57,14 @@ pip install -e .
```
from the repository root directory.

This project requires Python at least 3.6 and PyTorch 1.0.
This project requires Python at least 3.6 and PyTorch 1.3.
If you want to run YAML config examples, you'll also need a **project configuration file**
`.velproject.yaml`. An example is included in this repository.

Default project configuration writes
metrics to MongoDB instance open on localhost port 27017 and Visdom instance
on localhost port 8097.
Default project configuration writes logs to the tensorboard directory `output/tensorboard`
under the main directory. Output modules to visdom, mongodb and wandb are also implemented.

If you don't want to run these services, there is included
another example file `.velproject.dummy.yaml`
If you don't want any logging, there is included another example file `.velproject.dummy.yaml`
that writes training progress to the standard output only.
To use it, just rename it to `.velproject.yaml`.

Expand All @@ -80,6 +80,7 @@ To use it, just rename it to `.velproject.yaml`.
understand what exactly the model is doing for newcomers already comfortable with PyTorch.
- All state-of-the-art models should be implemented in the framework with accuracy
matching published results.
For up-to-date benchmarks, look here: [Benchmarks](docs/Benchmarks.md)
- All common deep learning workflows should be fast to implement, while
uncommon ones should be possible, at least as far as PyTorch allows.

Expand All @@ -89,7 +90,7 @@ To use it, just rename it to `.velproject.yaml`.
Several models are already implemented in the framework and have example config files
that are ready to run and easy to modify for other similar usecases:

- State-of-the art results on Cifar10 dataset using residual networks
- Residual networks (resnets) trained on Cifar10 dataset replicating published performance
- Cats vs dogs classification using transfer learning from a resnet34 model pretrained on
ImageNet

Expand All @@ -101,14 +102,14 @@ that are ready to run and easy to modify for other similar usecases:

# Implemented models - Reinforcement learning

- Continuous and discrete action spaces
- Basic support for LSTM policies for A2C and PPO
- Following published policy gradient reinforcement learning algorithms:
- Support for continuous and discrete environment action spaces
- Basic support for recurrent policies for A2C and PPO
- Following policy gradient reinforcement learning algorithms:
- Advantage Actor-Critic (A2C)
- Deep Deterministic Policy Gradient (DDPG)
- Proximal Policy Optimization (PPO)
- Trust Region Policy Optimization (TRPO)
- Actor-Critic with Experience Replay (ACER)
- Deep Deterministic Policy Gradient (DDPG)
- Deep Q-Learning (DQN) as described by DeepMind in their Nature publication with following
improvements:
- Double DQN
Expand All @@ -118,6 +119,14 @@ that are ready to run and easy to modify for other similar usecases:
- Distributional Q-Learning
- Noisy Networks for Exploration
- Rainbow (combination of the above)

# Implemented models - Unsupervised learning

- A simple AutoEncoder (AE) with example on MNIST dataset.
- Latent variable models:
- Variational AutoEncoders (VAE)
- Importance Weighted AutoEncoder (IWAE)
- Vector-Quantised Variational AutoEncoder (VQ-VAE)


# Examples
Expand All @@ -128,14 +137,14 @@ Most of the examples for this framework are defined using config files in the
For example, to run the A2C algorithm on a Breakout atari environment, simply invoke:

```
python -m vel.launcher examples-configs/rl/atari/a2c/breakout_a2c.yaml train
python -m vel.launcher examples-configs/rl/atari/atari_a2c.yaml train
```

If you install the library locally, you'll have a special wrapper created
that will invoke the launcher for you. Then, above becomes:

```
vel examples-configs/rl/atari/a2c/breakout_a2c.yaml train
vel examples-configs/rl/atari/atari_a2c.yaml train
```

General command line interface of the launcher is:
Expand All @@ -152,112 +161,6 @@ If you prefer to use the library from inside your scripts, take a look at the
well. Scripts generally don't require any MongoDB or Visdom setup, so they can be run straight
away in any setup, but their output will be less rich and less informative.

Here is an example script running the same setup as a config file from above:

```python
import torch
import torch.optim as optim

from vel.rl.metrics import EpisodeRewardMetric
from vel.storage.streaming.stdout import StdoutStreaming
from vel.util.random import set_seed

from vel.rl.env.classic_atari import ClassicAtariEnv
from vel.rl.vecenv.subproc import SubprocVecEnvWrapper

from vel.modules.input.image_to_tensor import ImageToTensorFactory
from vel.rl.models.stochastic_policy_model import StochasticPolicyModelFactory
from vel.rl.models.backbone.nature_cnn import NatureCnnFactory


from vel.rl.reinforcers.on_policy_iteration_reinforcer import (
OnPolicyIterationReinforcer, OnPolicyIterationReinforcerSettings
)

from vel.rl.algo.policy_gradient.a2c import A2CPolicyGradient
from vel.rl.env_roller.step_env_roller import StepEnvRoller

from vel.api.info import TrainingInfo, EpochInfo


def breakout_a2c():
device = torch.device('cuda:0')
seed = 1001

# Set random seed in python std lib, numpy and pytorch
set_seed(seed)

# Create 16 environments evaluated in parallel in sub processess with all usual DeepMind wrappers
# These are just helper functions for that
vec_env = SubprocVecEnvWrapper(
ClassicAtariEnv('BreakoutNoFrameskip-v4'), frame_history=4
).instantiate(parallel_envs=16, seed=seed)

# Again, use a helper to create a model
# But because model is owned by the reinforcer, model should not be accessed using this variable
# but from reinforcer.model property
model = StochasticPolicyModelFactory(
input_block=ImageToTensorFactory(),
backbone=NatureCnnFactory(input_width=84, input_height=84, input_channels=4)
).instantiate(action_space=vec_env.action_space)

# Reinforcer - an object managing the learning process
reinforcer = OnPolicyIterationReinforcer(
device=device,
settings=OnPolicyIterationReinforcerSettings(
batch_size=256,
number_of_steps=5,
),
model=model,
algo=A2CPolicyGradient(
entropy_coefficient=0.01,
value_coefficient=0.5,
max_grad_norm=0.5,
discount_factor=0.99,
),
env_roller=StepEnvRoller(
environment=vec_env,
device=device,
)
)

# Model optimizer
optimizer = optim.RMSprop(reinforcer.model.parameters(), lr=7.0e-4, eps=1e-3)

# Overall information store for training information
training_info = TrainingInfo(
metrics=[
EpisodeRewardMetric('episode_rewards'), # Calculate average reward from episode
],
callbacks=[StdoutStreaming()] # Print live metrics every epoch to standard output
)

# A bit of training initialization bookkeeping...
training_info.initialize()
reinforcer.initialize_training(training_info)
training_info.on_train_begin()

# Let's make 100 batches per epoch to average metrics nicely
num_epochs = int(1.1e7 / (5 * 16) / 100)

# Normal handrolled training loop
for i in range(1, num_epochs+1):
epoch_info = EpochInfo(
training_info=training_info,
global_epoch_idx=i,
batches_per_epoch=100,
optimizer=optimizer
)

reinforcer.train_epoch(epoch_info)

training_info.on_train_end()


if __name__ == '__main__':
breakout_a2c()
```

# Docker

Dockerized version of this library is available in from the Docker Hub as
Expand Down Expand Up @@ -307,17 +210,66 @@ Very likely to be included:


Possible to be included:
- Popart reward normalization
- Parameter Space Noise for Exploration
- Hindsight experience replay
- Generative adversarial networks

For version 0.5 I'll again be looking to expand widely on the spectrum of available models in the framework,
as well as I'll try to support **multi-gpu** training by data parallelism.

Code quality:
- Rename models to policies
- Force dictionary inputs and outputs for policies
- Factor action noise back into the policy
- Use linter as a part of the build process
Work in progress roadmap:

- Popart reward normalization
- PixelCNN
- MADE: Masked Autoencoder for Distribution Estimation
- Variational AutoEncoder with Inverse Autoregressive Flow


# Directories

Below I'll list brief explanation about contents of main top-level directories.

- `docs` - Few markdown documents about the framework
- `examples-configs` - Ready to run configs with tried and tested models, usually heavily inspired by existing
literature.
- `examples-notebooks` - A few examples of how to interact with `vel` from the level of IPython notebook
- `vel` - Root for the Python source of the package
- `vel.api` - Interfaces and base classes used all over the codebase. To be used in source code only and not
referenced from config files.
- `vel.callback` - Definitions of callbacks that can be used in the training process. Can be referenced both by code
and by the config files.
- `vel.command` - Commands that can be used in your configuration files, and there isn't much need to refer to
them from code.
- `vel.data` - Various classes for handling data sources and data transformations. Referenced both by source code
and config files.
- `vel.function` - Interfaces for creating various functions/interpolators, to be refereced by config files.
- `vel.internal` - Functions and classes to be only used by `vel` internally, and not by by user code nor configs.
- `vel.metric` - Code for tracking metrics during training of your models. To be used by both code and configs.
- `vel.model` - Definition of models, which is kind of an end-package that references all other packages. Models
contain most other parts of the pipeline and define a training procedure.
- `vel.module` - Various useful definitions of PyTorch modules, to be used when defining your own `models` and
`layers`.
- `vel.net` - "Network" module that may be referenced by a model to define neural network architecture used.
- `vel.net.layer` - Modular layer system for defining networks declaratively in configuration files.
- `vel.notebook` - Utilities for interfacing with `vel` using IPython notebooks
- `vel.openai` - Imported parts of the codebase of `openai/baselines` that I didn't want to bring as a package
dependency. To be referenced in code mostly.
- `vel.optimizer` - Various implementations of deep learning optimizers. To be referenced mostly by scripts.
- `vel.rl` - Meta package for everything related to Reinforcement Learning
- `vel.rl.api` - Interfaces and base classes to be used for Reinforcement Learning models and other classes.
- `vel.rl.buffer` - All classes relating to experience replay and experience buffers
- `vel.rl.command` - Commands used for RL training
- `vel.rl.env` - Basic reinforcement learning environments, mostly based on OpenAI gym
- `vel.rl.env_roller` - Classes for generating environment rollouts
- `vel.rl.layer` - Layers desined especially for RL
- `vel.rl.module` - PyTorch modules designed for RL
- `vel.rl.policy` - Equivalent of `vel.model` for RL
- `vel.rl.reinforcer` - Reinforcer manages RL training, and corresponds to `Trainer` in Supervised Learning
- `vel.rl.vecenv` - Utilities for vectorizing environments and stepping multiple environments at the same time
- `vel.scheduler` - Classes helping to set up learning rate schedules for the optimizers. To be referenced mostly
by scripts.
- `vel.storage` - Everything about persisting models and metrics. To be referenced mostly by configuration files.
- `vel.train` - Utilities for defining more generic training loops of models. To be referenced in both code and
config.
- `vel.util` - Collection of various utility functions to be used by all other modules.


# Citing
Expand Down
Loading