This project originates from the 2022 APAC HPC AI Competition, specifically from the problem 3.
Our work on this task consists of 2 stages:
- Hyperparameter tuning
- GPU cluster setting tuning
First, we search the optimal modification of the Leopard U-Net. The objective of this stage is to make the model achieve as high PRAUC as possible. After that, we move the optimal model to the second stage and compare the efficiency between different GPU cluster settings.
$ conda create -n "dna2" python=3.9.2
$ conda activate dna2
$ pip install -r requirements.txt To find out the optimal configuration of the U-Net model we used in the task, we perform automatic hyperparameter tuning with the Hyperopt python library.
This code base contains a complete training procedure and needed utilitiy files. We use PyTorch as the deep learning framwork on this stage.
The model architecture is the Leopard U-Net from https://github.com/GuanLab/Leopard. We picked hyperparameters from the model to be tuned:
-
dropout: the dropout rate of all dropout layers -
initial_filter: the number of filters in the top U-Net block -
kernel_size: convolution kernel size -
num_blocks: number of U-Net blocks -
pos_weight: weighting of positive samples in the loss function -
scale_filter: the factor (number of filters in the$n^{th}$ layer) / (number of filters in the$n-1^{th}$ layer)
The specific range of the search space is defined in the file deepLearningBasedDNASequenceFastDecoding/optim/optim_hyp.py.
-
Step 1: Converting the DNA dataset to a format which is compatible to torch. The output will be place at
data/$ python deepLearningBasedDNASequenceFastDecoding/data/convert.py deepLearningBasedDNASequenceFastDecoding/data/preprocessedCTCFFimoData/
-
Step 2: Submitting the
script/optim.shjob script.$ qsub script/optim.shThe script lanches
optim.py, which is the script doing hyperparameter tuning. In the script, we use hyperopt.fmin optimizer to search for the best model. The fmin optimizer will repeatly evaluate theobjectivefunction, which callstrain_dist.pyto train the model and reports the PRAUC score of each trials back to fmin. -
Step 3: Inspectting the real-time progress with Tensorboard
$ tensorboard --logdir logThe command launches a tensorboard instance watching tensorboard log files in
log/. Access it with a browser, for example,http://localhost:6006. It will visualize the real-time training progress. You can see the highest points of PRAUC score curves increases as more trials be issued. -
Step 4: Retrieving the optimal hyperparameters After about 100 trials, you can stop the job and look at the tensorboard to get the optimal hyperparameters. Select the trial that achieved the highest PRAUC, and see the value of the hyperparameters of this trials from its name.
qusb scripts/trainOnMultinode.sh