This folder contains the dataset to run the ExplorATE vignette.
humans data set
The inputs_hs directory contains simulated data from a human ovary sample. The reads subdirectory contains a subset of .fastq files from the original data set. Additionally, it includes a de novo transcriptome (trme_hs.fa) and a transcriptome-derived RepeatMasker file (RM_trme_hs.out).
Drosophila melanogaster data set
The inputs_dm directory contains a data set for Drosophila melanogaster based on Ohtani et al. (2013). This directory contains the files:
geneModel_dm.gtf, a gene model for the D. melanogaster genome (v3)
genome_dm.fa, a reference genome (v3)
RM_gen_dm.out, a genome-derived RepeatMasker file
RM_trme_dm.out, a RepeatMasker file derived from the transcriptome
trme_dm.fa, a de novo transcriptome for the data set
Users can download reads in .fastq format from Gene Expression Omnibus (accession no. GSE47006). See the ExplorATE vignette.
Liolaemus parthenos data set
The inputs_lp directory contains a subset of Liolaemus parthenos data. The complete data set is available in Gene Expression Omnibus (accession no GSE173261). The directory contains:
reads, a folder with the reads in .fastq (subset) format
blastAnot_lp.outfmt6, a BLAST output, in output format 6, for the de novo transcriptome
geneModel_lp.gff3, a TransDecoder output file
RM_lp, a RepeatMasker file for the de novo transcriptome (subset)
trme_lp, the de novo transcriptome (subset)
Clone the repository to the desired directory with the following command.
git clone https://github.com/FemeniasM/ExplorATE_data_testAccess the desired data set and unzip the files with the following
gzip -d *.gzCheck the vignette and the user guide for more information.