A set of R scripts to identify miRNA against known gene targets using the Targetscan predictions.
Targetscan is a set of predictive methods that aid in the identification of miRNA(s) that are likely to target a region (genes/transcripts) within a known genome.
Detailed method leading to the prediction of miRNA binding sites can be found in the original articles:
The perl pipeline associated with the improved re-implementation can be found in this GitHub repository.
The predictions generated as part of the TargetScan project are available in the form of a searchable web tool, however, using this tool to simultaneously identify miRNA(s) targeting multiple genes remains difficult. This is mainly due to the lack of batch search option. In this repository provides a set of R scripts that aid in this regard.
A full miRNA prediction data targeting human genes can be found in the TargetScan website. The scripts in this project download the Predicted Targets context++ scores (default predictions) and perform downstream processing.
- A working installation of R and R-studio is required.
- Download the project source code from GitHub in a zip format.
- Unzip the folder and open this folder in R studio as a project by opening the
TargetScanR.Rprojfile. - Install the necessary R packages (dependencies for the scripts) by running the following commands in the console.
# CRAN packages
install.packages(c(
"log4r",
"ggplot2",
"RColorBrewer",
"ggpubr",
"jsonlite",
"data.table",
"BiocManager"
))
# Bioconductor packages
BiocManager::install("org.Hs.eg.db")- Edit the
GeneTargets.txtfile to include the target gene symbols. - Open the file
R/search_miRNA.Rand run the entire script. - Locate your result in the
resultsfolder. The naming pattern of the output file includes the date it was generated (e.g:results/top_miRNA_09-02-2024.svg).
-
The current version only supports miRNA search targeting human genes.
-
The script is only tested with maximum of 20 target genes at a time, while the core modules are capable of handing unlimited target genes, the modules responsible for generating the plots might break.