-
Notifications
You must be signed in to change notification settings - Fork 5
Quickstart
This quickstart uses functionality in the dev branch of BAD_Mutations.
The config file stores paths to executables and reference data.
python BAD_Mutations.py setup \
-b /path/to/CDS/database/directory \
-t 'target_species_name' \
-e e_value_threshold \
-c /path/to/config.txtAfter writing a config file, use the fetch subcommand to pull the CDS files from public repositories and convert them to BLAST databases. You may omit the -u and -p options if you do not want to type your username and password as plain text into a terminal (this is probably a good thing).
python BAD_Mutations.py fetch \
-c /path/to/config.txt \
-u 'user@domain.com' \
-p 'MyAwesomePassword123'Note: the username and password are for the JGI Genome Portal
Use the VeP_to_Subs.py supporting script to generate the "long" substitutions file and the per-transcript substitutions files.
mkdir -p /path/to/per-transcript/substitutions/directory
python Supporting/VeP_to_Subs.py \
/path/to/VeP_report.txt.gz \
/path/to/long_substitutions.txt \
/path/to/per-transcript/substitutions/directoryTBD
Generate multiple sequence alignments of putative homologues using the CDS query sequences. This runs on a transcript-by-transcript basis, so you can use a tool like GNU Parallel to run many concurrently. To parallelize, split the all CDS fasta file in your species of interest into either a) one sequence per file or b) one gene per file. Each of these “split” files can then be passed to the -f option.
python BAD_Mutations.py align \
-c /path/to/config.txt \
-f /path/to/transcript.fa \
-o /path/to/MSA/output/directoryRun the HyPhy model on the specified codons in the multiple sequence alignment, conditioning on the phylogenetic tree. This is also run on a transcript-by-transcript basis.
python BAD_Mutations.py predict \
-c /path/to/config.txt \
-f /path/to/transcript.fa \
-a /path/to/MSA/output/directory/transcript_MSA.fasta \
-r /path/to/MSA/output/directory/transcript_tree.tree \
-s /path/to/per-transcript/substitutions/directory/transcript.subs \
-o /path/to/predictions/output/directoryCombine the per-transcript predictions files into a single file for easy downstream analysis.
python BAD_Mutations.py compile \
-P /path/to/predictions/output/directory \
-S /path/to/long_substitutions.txt