Skip to content

Repository for the article`Deciphering the comprehensive relationship between 5'UTR and 3'UTR sequences with deep learning

Notifications You must be signed in to change notification settings

hmdlab/utr_pairpred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTR_PairPred

Installation

  • Install required python libraries with poetry install

Basic requirements

python>=3.9.0
CUDA=11.8
torch=2.2.0

Data preprocess

Processed sequence embedding & sequence csv files can be downloaded from here

  1. Download GENCODE, Protein-coding transcript sequences fasta file from here

  2. Createing sequence df from GENCODE raw fasta file.

cd scripts
sh create_seq_df.sh
  • Then, gencode_v44(vM33)_utr_gene_unique.csv and gencode_v44(vM33)_utr_gene_unique_5utr(3utr).fa file will generate.
  • If you want to remove similar sequences, please run scripts/cd_hit.sh with those fasta files.
  1. Getting sequence embeddings (model inputs).
  • With RNA-FM: sh get_emb_rnafm.sh
  • With RiNALMo: sh get_emb_rinalmo.sh
  • For random forest feature: sh get_rf_feature.sh

Training prediction models

  • Use src/run_train_XX.py code for training (replace XX from the below learning method abb table as you want).
  • Config also has name rule config/<SPECIES>_<LEARNING_METHOD>.yaml
abb full
cl contrastive learning
sv supervised learning
rf random forest
  • Run example
poetry run python run_train_cl.py --cfg ../config/human_cl.yaml

Downstream analysis

  • crossval_analysis.ipynb:
    Performs cross-validation analysis to evaluate the consistency of results across experiments. Visualizes the distribution of cosine similarity and correlations between different experiments.

  • sequential_analysis.ipynb: Analyzes basic sequence features (e.g., lengths of 5'UTR, 3'UTR, CDS, and MFE)

  • expression_analysis.ipynb: Analyzes translation efficiency (TE) using RNA-seq and Ribo-seq data for each cell line.

Citation

citation information will be written in here

About

Repository for the article`Deciphering the comprehensive relationship between 5'UTR and 3'UTR sequences with deep learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published