-
Notifications
You must be signed in to change notification settings - Fork 10
Algorithms to compute DNA complexity
License
caballero/SeqComplex
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
== SeqComplex ==
This is a collection of methods to compute the composition and complexity of a DNA sequence(s) from a Fasta file.
The SeqComplex.pm is a Perl Module containing implementations for each complexity
measure. Additionally, several tools are provided which utilitize this module. They include:
(1) compSeq.pl compute the methods in a windowed mode.
(2) profileComplexSeq.pl compute the methods using the whole sequence.
(3) gatherStats.pl: Example script to run all methods in windowed mode and save raw data
for later processing.
(4) displayStats.pl: Example script to read in raw data from gatherStats.pl and
display as either a table or a Google Charts HTML file.
Computed methods
*gc: C+G content
*gcs: C+G skew
*cpg: CpG skew
*cwf: Complexity by Wootton & Federhen
*ce: Entropy
*cz: Complexity as compression ratio (using Gzip)
*cmN: Complexity as Markov model size of N
*ctN: Trifnov's complexity with order N
*clN: Linguistic complexity with order N
Additional methods
*ats: A+T skew
*ket: Keto skew
*pur: Purine skew
Citation
* Caballero J, Smit AFA, Hood L, Glusman G, Realistic artificial DNA sequences as negative controls for computational genomics, Nucleic Acids Research, 2014. https://doi.org/10.1093/nar/gku356
Copyright (C) 2009-2015 by Juan Caballero [jcaballero@systemsbiology.org]
All code is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
About
Algorithms to compute DNA complexity
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published