Skip to content

MiniMonsterplex is an automatic variant calling pipeline for fungal pathogens.

License

Notifications You must be signed in to change notification settings

TrStans606/MiniMonsterPlex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

154 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MiniMonsterPlex

MiniMonsterplex is an automatic variant calling pipeline for POSIX systems (Linux,macOS,Windows Subsystem Linux). It is recommended you use Conda for setup.

Table of Contents

  1. Requirements
  2. Data Input
  3. Command Line Functions
  4. [R Shiny App]
  5. R Wrapper
  6. Metadata Format
  7. Tree building with MLtree

Requirements

Install via Conda:

Sample Conda command for setup

conda env create -f environment.yml
conda activate monsterPlex

Data Input

Fastq files with either a .fq or .fastq extension should be gzip compressed, extension .gz, and dropped into the fastq/ folder before running. If your files are all uncompressed try using this command in the fastq/ folder to bulk compress them:

bgzip *.fastq

or

bgzip *.fq

Depending on what extension your files are.

Command Line Functions

Python3 MiniMonsterPlex.py -o [output folder name/] -m [.csv metadata file name] -f [folder name/] --complete -i [isolate_1] [isolate_2] -il [example.txt] -hf [host_1] [host_2] -hfl [example.txt] -h
  • -h= Help command: including this flag will bring up the help screen.
  • -o= Output Folder: User given name for the created output folder. When no option is used it defaults to output. Note you must must give the name of a non existant folder.
  • -m= Metadata file: Name of the .csv metadata file formatted as shown below.
  • -f=Input Folder: the name of the folder where your fastq.gz input files are. Defaults to the fastq folder included with this repository
  • --complete= builds a tree comparing your current run to data base of all your past ones.

Filtering options:

Raxml requires a minimum of 4 isolates in a multi fasta file to generate a tree. If you do not provide 4 isolates or your chosen host does not have 4 isolates the program will stop and ask if you want to continue without filtering or quit entirely.

NOTE: Isolate should be the name of the file you are uploading minus the extensions: so SRR1571.fq.gz will be SRR1571. Host names should be the exact same as those entered into your metadata file.

  • -i= Isolate list[Optional]: a space separated list of all isolates you want included in the tree building.
  • -il= Isolate file[Optional]: a new line separated txt file of all isolates you want included in the tree building. This can be combined with -i.
  • -hf= Host list[Optional]: a space separated list of all isolates from the specific hosts listed you want in tree building.
  • -hfl= Host file[Optional]: a new line separated txt file of all hosts you want included in the tree building. This can be combined with -hf.

The host and isolate filtering can be combined. In that case the program will first filter by host and then filter by isolate.

R Shiny App

MiniMonsterPlex now offers an R shiny app version.

  1. Make sure you are in the conda environment created above:
conda activate monsterPlex
  1. Type r; this will bring you to the interactive environment:
r
  1. Type shiny::runApp(); this will pull up the app in your browser.
shiny::runApp()
  1. Fill out all of the fields. The defaults are filled out for you if you desire to make no changes.

R Wrapper

MiniMonsterPlex now offers an R wrapper in the form of MiniMonsterPlexWrapper.r, which offers all of the same functionality in an interactive R environment.

  1. Make sure you are in the conda environment created above:
conda activate monsterPlex
  1. Type r; this will bring you to the interactive environment:
r
  1. Source the R script:
source("MiniMonsterPlexWrapper.r")
  1. Answer the interactive questions:

This is equivalent to the -o option above and is required:

Enter the output folder. The folder must not exist:

This is equivalent to the -m option and is required. A sample metadata file is provided below:

Enter the metadata.csv file:

This is equivalent to the -f option and is required. The folder provided is fastq:

Enter the input folder:

This is equivalent to the -i option and is optional. Press enter to skip it:

Enter the space-separated isolate list. Optional push enter if you want to skip:

This is equivalent to the -il option and is optional. Press enter to skip it:

Enter the isolate list file. Optional push enter if you want to skip:

This is equivalent to the -hf option and is optional. Press enter to skip it:

Enter the space-separated host list. Optional push enter if you want to skip:

This is equivalent to the -hfl option and is optional. Press enter to skip it:

Enter the host list file. Optional push enter if you want to skip:

Metadata Format

MiniMonsterPlex requires a custom .csv format for metadata:

sampleID,species,country,lineage,host
104,Po,China,1,Oryza
105,.,.,.,.
  • The sampleID is the exact same of the fastq file given to MiniMonsterPlex so in this example it would be 104.fastq.
  • The species is the species name where the sequencing was done.
  • The country is the country of origin.
  • The lineage is the lineage of the pathogen.
  • The host is the host of the pathogen. Non existant fields should be filled in with a period. NOTE: fields cannot have , or _ characters. These are used as seperator characters. If you input a seqid with _ characters they will all be replaced with - characters.

A sample csv file can be found as metadata.csv

TreeBuilding with MLtree

mlTree_sample

About

MiniMonsterplex is an automatic variant calling pipeline for fungal pathogens.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published