17 lines (12 loc) · 922 Bytes

RNAseq_pipeline

Scripts for the Tn5-TagSeq manuscript

*demultiplex.sh Demultiplexes samples based on i7 barcodes. If several plates (different i5) were sequenced together, an initial demultiplex step has to be done using Lance's i5 code (i5_parse_gencomp1_template.sbatch)

This step also generales the "listfiles" file used in the next scripts.

*pipeline_array2.sh This script runs all samples in an array by calling *map_readcoutns2.sh" that is where the trimming, mapping, deduplication (UMIs), and genecount happens. It defines whether deduplication should be done or not, and defines the genome and genome annotation variabiles.
*parsing_output.sh This scripts outputs 2 files

readcountsallsamples.txt, it contains the read counts per gene per samples for all the samples processed in the array
read.parameters, it contains the number of raw reads, trimmed out, mapped, assigned to genes, etc.