This repository was archived by the owner on Sep 26, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Multiple sequence alignment of aa/codon sequences with tandem repeats
License
butzist/ProGraphMSA
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
____ ____ _ __ __ ____ _ _____ ____
| _ \ _ __ ___ / ___|_ __ __ _ _ __ | |__ | \/ / ___| / \ _|_ _| _ \
| |_) | '__/ _ \| | _| '__/ _` | '_ \| '_ \| |\/| \___ \ / _ \ _| |_| | | |_) |
| __/| | | (_) | |_| | | | (_| | |_) | | | | | | |___) / ___ \_ _| | | _ <
|_| |_| \___/ \____|_| \__,_| .__/|_| |_|_| |_|____/_/ \_\|_| |_| |_| \_\
|_|
Fast and Robust Phylogeny-Aware Multiple Sequence Alignment
+ tandem repeat unit insertions and deletions
PLEASE NOTE: this repository is ARCHIVED and UNMAINTAINED. All further development is done at:
https://github.com/acg-team/ProGraphMSA
___ _
| _ \_ _ _ _ _ _ (_)_ _ __ _
| / || | ' \| ' \| | ' \/ _` |
|_|_\\_,_|_||_|_||_|_|_||_\__, |
|___/
___ ___ _ __ __ ___ _
| _ \_ _ ___ / __|_ _ __ _ _ __| |_ | \/ / __| /_\
| _/ '_/ _ \ (_ | '_/ _` | '_ \ ' \| |\/| \__ \/ _ \
|_| |_| \___/\___|_| \__,_| .__/_||_|_| |_|___/_/ \_\
|_|
The easiest way to run ProGraphMSA with the recommended command line parameters
is to use the wrapper script "ProGraphMSA+TR.sh" included in this package. Just run
./ProGraphMSA+TR.sh <file>
or
./ProGraphMSA+TR.sh -o <output_file> <file>
ProGraphMSA+TR will run using ML distances, the WAG model, and will output an
alignment in FASTA format. Further it will use T-REKS from the file T-Reks.jar
to detect tandem repeats. To use TRUST adjust the installation path in
trust2treks.py and run
./ProGraphMSA+TR.sh --custom_tr_cmd trust2treks.py <file>
___ _ _ _
/ __|___ _ __ _ __ __ _ _ _ __| | | (_)_ _ ___
| (__/ _ \ ' \| ' \/ _` | ' \/ _` | | | | ' \/ -_)
\___\___/_|_|_|_|_|_\__,_|_||_\__,_| |_|_|_||_\___|
_
_ __ __ _ _ _ __ _ _ __ ___| |_ ___ _ _ ___
| '_ \/ _` | '_/ _` | ' \/ -_) _/ -_) '_(_-<
| .__/\__,_|_| \__,_|_|_|_\___|\__\___|_| /__/
|_|
USAGE:
./ProGraphMSA [--ancestral_seqs] [--all_trees] [-i <iterations>] [-T] [-I]
[-M] [-m] [-a] [-C <count>] [-F] [--custom_model <file>] [-w]
[-c <file>] [-r] [--custom_tr_cmd <command>] [--trd_output
<filename>] [--read_repeats <T-Reks format output>] [-R] ...
[--repalign] [--repeat_indel_ext <probability>]
[--repeat_indel_rate <rate>] [-A] [-P <distance>] [-p
<distance>] [-D <distance>] [-d <distance>] [-x <distance>]
[-s <probability>] [-l <distance>] [-E <probability>] [-e
<probability>] [-g <rate>] [-f] [--dna] [--codon] [--topology
<newick file>] [-t <newick file>] [-o <filename>] [--]
[--version] [-h] <fasta file>
Tandem-repeat related parameters:
=================================
-R, --repeats
use T-REKS to identify tandem repeats
--custom_tr_cmd <command>
custom command for detecting tandem-repeats
--trd_output <filename>
write TR detector output to file
--read_repeats <T-REKS format output>
read TR detector output from file
--repalign
re-align detected tandem repeat units
--repeat_indel_ext <probability>
repeat indel extension probability
--repeat_indel_rate <rate>
insertion/deletion rate for repeat units (per site)
Guide tree, distances, and substitution model:
==============================================
-i <iterations>, --iterations <iterations>
number of iterations re-estimating guide tree [default: 2]
-m, --mldist
use distances estimated by a Maximum-Likelihood method
-a, --nwdist
estimate initial distance tree from Needleman-Wunsch alignments
-D <distance>, --max_dist <distance>
maximum distance for alignment
-F, --estimate_aafreqs
estimate equilibrium amino acid frequencies from input data
-w, --darwin
use model of evolution from Darwin (GONNET matrix and different indel model parameters, otherwise WAG will be used)
--custom_model <file>
custom substitution model in qmat format
-c <file>, --cs_profile <file>
path to library of context-sensitive profiles (we distribute a copy in the 3rd_party folder)
-A, --no_force_align_m
do not force alignment of initial Methionine
Parameters for adjusting the indel model:
=========================================
-l <distance>, --edge_halflife <distance>
edge half-life (evolutionary distance at which the probability of re-using an unsused graph is halved)
-E <probability>, --end_indel_prob <probability>
probability of mismatching sequence ends (set to -1 to disable this feature)
-e <probability>, --gap_ext <probability>
gap extension probability
-g <rate>, --indel_rate <rate>
insertion/deletion rate
Input/Output:
=============
-f, --fasta
output fasta format (instead of stockholm)
-t <newick file>, --tree <newick file>
initial guide tree
-o <filename>, --output <filename>
Output file name
-I, --input_order
output sequences in input order (default: tree order)
--dna
align DNA sequence
--codon
align DNA sequence based on a codon model
--ancestral_seqs
output all ancestral sequences
<fasta file>
(required) input sequences
___ _ _ _ _ __
| _ )_ _(_) |__| (_)_ _ __ _ / _|_ _ ___ _ __ ___ ___ _ _ _ _ __ ___
| _ \ || | | / _` | | ' \/ _` | | _| '_/ _ \ ' \ (_-</ _ \ || | '_/ _/ -_)
|___/\_,_|_|_\__,_|_|_||_\__, | |_| |_| \___/_|_|_| /__/\___/\_,_|_| \__\___|
|___/
For building ProGraphMSA from source you need:
CMake >=2.8 (http://www.cmake.org)
tclap >=1.1.0 (http://tclap.sourceforge.net)
Eigen 2.0.x or 3.0.x (http://eigen.tuxfamily.org)
on Debian/Ubuntu you can install these programs/libraries with:
sudo apt-get install cmake libtclap-dev libeigen2-dev
Then perform the following command to configure/build/install ProGraphMSA:
cd BUILD
ccmake .. (press "c" to configure and "g" to generate the Makefile, see below
for additional configuration options)
make ProGraphMSA
make install
Additional CMake configuration options (in "ccmake .."):
EIGEN2_INCLUDE_DIR: set this to the path, where Eigen is installed, if you use
Eigen 3.0.x or if Eigen has been installed at a non-default
location (default location: /usr/include/eigen2)
WITH_EIGEN3: set this to ON, if you want to compile ProGraphMSA with
Eigen 3.0.x
CMAKE_CXX_FLAGS: add options for the C++ compiler, like optimization flags,
or additional search paths for include files (-I <path>)
WITH_SSE: disable this option, if you build ProGraphMSA for a machine
that does not support SSE2
About
Multiple sequence alignment of aa/codon sequences with tandem repeats
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published