This is a template for deep learning projects.
To instantiate your project using this template, on the template's GitHub page, click Use this template, then Create a new repository.
Otherwise, if you already have a local git repository, run
cd ${local-git-repo}
git remote add ml-template https://github.com/ignasa007/ML-Template.git
git fetch ml-template
git merge --allow-unrelated-histories ml-template/main
git remote remove ml-templatealgorithms- Learning algorithms, including PyTorch optimizers and step size schedulers.assets- Plots generated by different experiments.config- Configuration files for different architectures, datasets, optimizers, schedulers, ...datasets- Python classes to handle different datasets, and make them suitable for training.datastore- Raw datasets store.metrics- Python classes to handle different training objectives and other metrics tracked.models- Python classes to assemble models.results- Results of the different runs.utils- Utility functions for running the transformer experiments.main.py- Main file for training the models.
conda create --name ${env_name} --file requirements.txt
conda activate ${env_name}To run the experiments, execute
python -m main \
--dataset "${dataset}" \
--architecture "${model}" \
--optimizer "${optimizer}" \
--scheduler "${scheduler}" \
--device_index "${CUDA_VISIBLE_DEVICES}"You can override configurations in configs using the command line, e.g.
python -m main \
--dataset "${dataset}" \
--architecture "${architecture}" \
--optimizer "${optimizer}" \
--scheduler "${scheduler}" \
--device_index "${CUDA_VISIBLE_DEVICES}" \
data.batch_size "64" \
architecture.conv.num_channels "8 16*2 32"-
Make sure to omit
--device_indexif you do not wish to use a GPU. -
Set
loader.num_workersto 0 when TMPDIR is mounted on an NFS, since in that case, the DataLoader throws an error before exiting:OSError: [Errno 16] Device or resource busy: '.nfs...'See PyTorch issue for details.
-
When reading configurations, combinations of
.,-and/or_, such as...or just-, can be used to indicatekwargswhich are to be skipped. E.g.strides: "None 1 ..."implies that the first and second layers'kwargsarestrides=Noneandstrides=1, respectively, while for the third layer,strideis not passed (default value is used). -
Make sure to use spaces, NOT tabs (as much as it may hurt you to do so), since YAML parsers require spaces to mark indentation.
- The project supports up to 1 GPU per run, since I don't know how to distribute computing over multiple GPUs :').
- We don't support
ReduceLROnPlateauscheduler because its API doesn't tie well with the rest of the project's organization.
Loosely, in order of priority:
- Add support for dropout.
- Add support for residual connections.
- Implement
LazyLayerNorm. - Add support for initialization schemes. Note that lazy modules are initialized at first forward pass, not with the rest of the model (using
.reset_parameters) – need to handle that. - Figure out how to handle multidimensional regression metrics.