MiniMonsterplex is an automatic variant calling pipeline for POSIX systems (Linux,macOS,Windows Subsystem Linux). It is recommended you use Conda for setup.
- Requirements
- Data Input
- Command Line Functions
- [R Shiny App]
- R Wrapper
- Metadata Format
- Tree building with MLtree
Install via Conda:
- Python 3.10 or higher
- R 3.2.1 or higher
- R package: ape
- R package: ggrepel
- R Bioconductor package: ggtree
- R package: ggtext
- R package: glue
- radian
- Bowtie2
- Tabix
- Samtools
- Bcftools
- BedTools
- RAXML
conda env create -f environment.ymlconda activate monsterPlexFastq files with either a .fq or .fastq extension should be gzip compressed, extension .gz, and dropped into the fastq/ folder before running. If your files are all uncompressed try using this command in the fastq/ folder to bulk compress them:
bgzip *.fastq
or
bgzip *.fq
Depending on what extension your files are.
Python3 MiniMonsterPlex.py -o [output folder name/] -m [.csv metadata file name] -f [folder name/] --complete -i [isolate_1] [isolate_2] -il [example.txt] -hf [host_1] [host_2] -hfl [example.txt] -h
-h= Help command: including this flag will bring up the help screen.-o= Output Folder: User given name for the created output folder. When no option is used it defaults to output. Note you must must give the name of a non existant folder.-m= Metadata file: Name of the .csv metadata file formatted as shown below.-f=Input Folder: the name of the folder where your fastq.gz input files are. Defaults to the fastq folder included with this repository--complete= builds a tree comparing your current run to data base of all your past ones.
Filtering options:
Raxml requires a minimum of 4 isolates in a multi fasta file to generate a tree. If you do not provide 4 isolates or your chosen host does not have 4 isolates the program will stop and ask if you want to continue without filtering or quit entirely.
NOTE: Isolate should be the name of the file you are uploading minus the extensions: so SRR1571.fq.gz will be SRR1571. Host names should be the exact same as those entered into your metadata file.
-i= Isolate list[Optional]: a space separated list of all isolates you want included in the tree building.-il= Isolate file[Optional]: a new line separated txt file of all isolates you want included in the tree building. This can be combined with -i.-hf= Host list[Optional]: a space separated list of all isolates from the specific hosts listed you want in tree building.-hfl= Host file[Optional]: a new line separated txt file of all hosts you want included in the tree building. This can be combined with -hf.
The host and isolate filtering can be combined. In that case the program will first filter by host and then filter by isolate.
MiniMonsterPlex now offers an R shiny app version.
- Make sure you are in the conda environment created above:
conda activate monsterPlex- Type r; this will bring you to the interactive environment:
r- Type shiny::runApp(); this will pull up the app in your browser.
shiny::runApp()- Fill out all of the fields. The defaults are filled out for you if you desire to make no changes.
MiniMonsterPlex now offers an R wrapper in the form of MiniMonsterPlexWrapper.r, which offers all of the same functionality in an interactive R environment.
- Make sure you are in the conda environment created above:
conda activate monsterPlex- Type r; this will bring you to the interactive environment:
r- Source the R script:
source("MiniMonsterPlexWrapper.r")- Answer the interactive questions:
This is equivalent to the -o option above and is required:
Enter the output folder. The folder must not exist:
This is equivalent to the -m option and is required. A sample metadata file is provided below:
Enter the metadata.csv file:
This is equivalent to the -f option and is required. The folder provided is fastq:
Enter the input folder:
This is equivalent to the -i option and is optional. Press enter to skip it:
Enter the space-separated isolate list. Optional push enter if you want to skip:
This is equivalent to the -il option and is optional. Press enter to skip it:
Enter the isolate list file. Optional push enter if you want to skip:
This is equivalent to the -hf option and is optional. Press enter to skip it:
Enter the space-separated host list. Optional push enter if you want to skip:
This is equivalent to the -hfl option and is optional. Press enter to skip it:
Enter the host list file. Optional push enter if you want to skip:
MiniMonsterPlex requires a custom .csv format for metadata:
sampleID,species,country,lineage,host
104,Po,China,1,Oryza
105,.,.,.,.
- The
sampleIDis the exact same of the fastq file given to MiniMonsterPlex so in this example it would be 104.fastq. - The
speciesis the species name where the sequencing was done. - The
countryis the country of origin. - The
lineageis the lineage of the pathogen. - The
hostis the host of the pathogen. Non existant fields should be filled in with a period. NOTE: fields cannot have , or _ characters. These are used as seperator characters. If you input a seqid with _ characters they will all be replaced with - characters.
A sample csv file can be found as metadata.csv
