UTR_PairPred

Installation

Install required python libraries with poetry install

Basic requirements

python>=3.9.0
CUDA=11.8
torch=2.2.0

If you want to preprocess by yourself, please also install tools as following instructions.
- cd-hit: https://github.com/weizhongli/cdhit
- ViennaRNA: https://github.com/ViennaRNA/ViennaRNA

Data preprocess

Processed sequence embedding & sequence csv files can be downloaded from here

Download GENCODE, Protein-coding transcript sequences fasta file from here
Createing sequence df from GENCODE raw fasta file.

cd scripts
sh create_seq_df.sh

Then, gencode_v44(vM33)_utr_gene_unique.csv and gencode_v44(vM33)_utr_gene_unique_5utr(3utr).fa file will generate.
If you want to remove similar sequences, please run scripts/cd_hit.sh with those fasta files.

Getting sequence embeddings (model inputs).

With RNA-FM: sh get_emb_rnafm.sh
With RiNALMo: sh get_emb_rinalmo.sh
For random forest feature: sh get_rf_feature.sh

Training prediction models

Use src/run_train_XX.py code for training (replace XX from the below learning method abb table as you want).
Config also has name rule config/<SPECIES>_<LEARNING_METHOD>.yaml

abb	full
cl	contrastive learning
sv	supervised learning
rf	random forest

Run example

poetry run python run_train_cl.py --cfg ../config/human_cl.yaml

Downstream analysis

crossval_analysis.ipynb:
Performs cross-validation analysis to evaluate the consistency of results across experiments. Visualizes the distribution of cosine similarity and correlations between different experiments.
sequential_analysis.ipynb: Analyzes basic sequence features (e.g., lengths of 5'UTR, 3'UTR, CDS, and MFE)
expression_analysis.ipynb: Analyzes translation efficiency (TE) using RNA-seq and Ribo-seq data for each cell line.

Citation

citation information will be written in here

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
config		config
notebooks		notebooks
preprocess		preprocess
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UTR_PairPred

Installation

Data preprocess

Training prediction models

Downstream analysis

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hmdlab/utr_pairpred

Folders and files

Latest commit

History

Repository files navigation

UTR_PairPred

Installation

Data preprocess

Training prediction models

Downstream analysis

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages