RibonanzaNet

Training code for RibonanzaNet, preprint: https://www.biorxiv.org/content/10.1101/2024.02.24.581671v1.

Example notebooks

You may not want to retrain RibonanzaNet from scratch and rather just use pretrained checkpoints, so we have created example notebooks:
finetune: https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-finetune
secondary structure inference: https://www.kaggle.com/code/shujun717/ribonanzanet-2d-structure-inference
chemical mapping inference: https://www.kaggle.com/code/shujun717/ribonanzanet-inference

Data Download

You just need train_data.csv, test_sequences.csv, and sample_submission.csv from https://www.kaggle.com/competitions/stanford-ribonanza-rna-folding/data

Environment

Create the environment from the environment file env.yml

conda env create -f env.yml

Install ranger optimizer

conda activate torch

git clone https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
cd Ranger-Deep-Learning-Optimizer
pip install -e .

How to run

First activate environment conda activate torch

Set up accelerate with accelerate config in the terminal or with --config_path option

For an example of a accelerate config file, see accelerate_config.yaml

Training

accelerate launch run.py --config_path configs/pairwise.yaml

Inference

accelerate launch inference.py --config_path configs/pairwise.yaml

Process raw prediction into submission file for Ribonanza

python make_submission.py --config_path configs/pairwise.yaml

Configuration File

This section explains the various parameters and settings in the configuration file for RibonanzaNet

Model Hyperparameters

learning_rate: 0.001
The learning rate for the optimizer. Determines the step size at each iteration while moving toward a minimum of the loss function.
batch_size: 2
Number of samples processed per GPU per batch.
test_batch_size: 8
Batch size used for testing the model per GPU per batch.
epochs: 40
Total number of training epochs the model goes through.
dropout: 0.05
The dropout rate for regularization to prevent overfitting. It represents the proportion of neurons that are randomly dropped out of the neural network during training.
weight_decay: 0.0001
Regularization technique to prevent overfitting by penalizing large weights.
k: 5 1D Convolution kernel size
ninp: 256
The size of the input dimension.
nlayers: 9
Number of RibonanzaNet blocks.
nclass: 2
Number of classes for classification tasks.
ntoken: 5
Number of tokens (AUGC + padding/N token) used in the model.
nhead: 8
The number of heads in multi-head attention models.
use_flip_aug: true
Indicates whether flip augmentation is used during training/inference.
gradient_accumulation_steps: 2
Number of steps to accumulate gradients before performing a backward/update pass.
use_triangular_attention: false
Specifies whether to use triangular attention mechanisms in the model.
pairwise_dimension: 64
Dimension of pairwise interactions in the model.

Data Scaling

use_data_percentage: 1
The fraction of data used from the dataset (1= full data training).
use_dirty_data: true
Indicates whether to include training data that has only one of 2A3/DMS profiles with SN>1.

Other Configurations

fold: 0
The current fold in use if the data is split into folds for cross-validation.
nfolds: 6
Total number of folds for cross-validation.
input_dir: "../../input/"
Directory for input data. Put train_data.csv, test_sequences.csv, and sample_submission.csv here.
gpu_id: "0"
Identifier for the GPU used for training. Useful in single-GPU setup.

File structure

logs has the csv log file with train/val oss, models has model weights and optimizer states, oofs has the val predictions

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
Dataset.py		Dataset.py
Functions.py		Functions.py
LICENSE		LICENSE
Network.py		Network.py
README.md		README.md
accelerate_config.yaml		accelerate_config.yaml
dropout.py		dropout.py
env.yml		env.yml
inference.py		inference.py
make_submission.py		make_submission.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RibonanzaNet

Example notebooks

Data Download

Environment

How to run

Training

Inference

Process raw prediction into submission file for Ribonanza

Configuration File

Model Hyperparameters

Data Scaling

Other Configurations

File structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RibonanzaNet

Example notebooks

Data Download

Environment

How to run

Training

Inference

Process raw prediction into submission file for Ribonanza

Configuration File

Model Hyperparameters

Data Scaling

Other Configurations

File structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages