@@ -32,53 +32,3 @@ This will generate
3232
3333
3434
35- # Summary
36-
37- RNA-sequencing has found numerous implementations in research, from distinguishing immune cell subtypes
38- to differential gene expression between cancer versus normal tissue types [ @Villani2017 ] [ @Bao2021 ] .
39- Another application of RNA-seq is to identify novel transcripts involved in various biological processes
40- (Gupta, Kleinjans and Caiment 2021). The most relevant is context and cell-type-specific non-coding RNAs,
41- such as long non-coding RNAs (lncRNAs), which have become a case-point for most transcriptomic studies proving
42- their role in regulating gene expression, post-transcriptional regulation, and epigenetic regulation
43- (Engreitz et al. 2016, Zhu et al. 2019).
44- It is becoming crucial to check the relative expression of lncRNAs in transcriptome-wide studies. Our group
45- has developed an automated lncRNA expression pipeline. The only requirement from the user-side is raw data in
46- FASTQ format ( Paired-end or Single-end). The user will get a list of known and novel lncRNAs, and differential
47- gene expression between condition(s). The pipelines come with a user-friendly GUI, thereby eliminating the need
48- for the user to be versed in complex transcriptome analysis and UNIX environment. The source code is available
49- under an open-source licence at https://github.com/schosio/CHOLAR .
50-
51- # Statement of need
52-
53- The number of inferences generated from RNA-seq datasets is countless. The software used in the RNA-seq analysis
54- pipeline requires a UNIX-based command-line interactive (CLI) environment, with each software executed in succession.
55- Installing multiple CLI tools, handling various file formats, and plotting graphs require understanding of Linux and
56- R programming language. Moreover, no tool identifies novel lncRNAs from raw transcriptomic data to the best of our
57- knowledge. LncRNA identification tools such as ` CPAT ` (Wang et al. 2013) (and other tools) take either transcript
58- sequence (FASTA) or transcript coordinate (BED, GTF) as input and provide a list of predicted lncRNAs.
59- To address these issues, we developed CHOLAR which is a tool for characterization of LncRNA from raw reads .
60- ` CHOLAR ` i) identifies novel lncRNAs from raw reads ii) provides a user-friendly GUI interface to make changes
61- at every step iii) allows to identify differentially expressed genes and lists known and novel lncRNAs iv) generates
62- publication-quality plots such as MA plot, Volcano plot and heatmap.
63-
64- # Implementation
65-
66- The ` CHOLAR ` pipeline is implemented in bash and R, where it first reads the input FASTQ files(s) to check the
67- quality of reads using ` FastQC ` (Fiancette et al. 2021). Bad quality ( < 28 ) reads and adaptors are removed using
68- ` Trimmomatic ` (Bolger, Lohse and Usadel 2014). ` HISAT2 ` performs the mapping of reads on the human reference genome
69- (hg38) (Zhang et al. 2021). The SAM files generated from ` HISAT2 ` are converted to BAM, and PCR duplicates are removed
70- utilising the ` samtools ` toolkit (Danecek et al. 2021).
71- The transcript assembly is done using Stringtie, and the resulting GTF files are merged using the merge utility of
72- ` Stringtie ` (Pertea et al. 2015). The merged GTF file is compared against the reference annotation file from GENCODE
73- (Frankish et al. 2021) to filter novel transcripts using ` GFFCOMPARE ` . The coding potential of novel transcripts is
74- predicted using CPAT (Wang et al. 2013). The gene counts are calculated using ` HTSeq ` (Putri et al. 2022), and subsequent
75- differential gene expression analysis (DGEA) is done using packages from R statistical language.
76- The ` DESeq2 ` package is used for performing DGEA (Love, Huber and Anders 2014), and ` ggplot ` and ` dplyr ` libraries for
77- plotting graphs. The Graphical user interface is built using zenity in bash. The schematic of the tool is given in figure 1.
78-
79- # Example
80-
81- We chose the GSE147761 dataset from the GEO database (NCBI) to showcase the CHOLAR tool. A sample of results and plots
82- generated by the tool are given in figure 2. The GUI of the tool is shown in figure 3.
83-
84-
0 commit comments