kmers.py

A pure Python3 k-mer analysis library. This library is agnostic to a type of sequences used and treats them internally as strings.

Installation

Python2 is not supported. For python3 it's best to use PyPI.

pip3 install kmers

Usage

Generating k-mer distributions

First, create a Composition object, supplying the value of k. Initial data can be supplied either through seq (see self.process) or fh, which accepts a file-like object and reads the composition data from a file. If both are omitted, an empty Composition object is created.

import kmers.kmers as kmers
composition = kmers.Composition(k=3, seq=None, fh=None)

Data can be added to an existing model later. You can add either a single SeqRecord object or an iterable of SeqRecords. There is no limit to how many data can be loaded in the single Composition, except hardware limitations. In practice 12-mer distribution of complete H. sapiens proteome, including isoforms, takes more than 10 Gb of RAM using Cython.

composition.process(seq, update=False)

Relative and logarithmic distributions are computed lazily, so it's possible that first time you access them (since adding a sequence) it will take some time. They are available as Composition attributes that support dict API:

composition.relative_distribution['CMLD']
composition.log_distribution['CMLD']

Analyzing distributions

Given sequence a_seq, you can find probability it was generated by this distribuition. Comparing such probabilities for a series of distributions is a primitive, but functional sequence classifier.

p = composition.prob(a_seq)

Given two Composition objects, you can find distance between them. Currently only n-dimensional Euclidean and feature frequency profile (Sims et al. 2008) distance metrics are supported.

e = kmers.euclidean(comp_a, comp_b)
f = kmers.ffp_distance(comp_a, comp_b)

License

This code is distributed under the terms of MIT license. Unrestricted use or modification of library is allowed provided that original author (A. A. Morozov) is properly cited. If used in scientific publication, please also cite my abstract from BGRS/SB-2016.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
kmers		kmers
tests		tests
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

kmers.py

Installation

Usage

Generating k-mer distributions

Analyzing distributions

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

SynedraAcus/kmers

Folders and files

Latest commit

History

Repository files navigation

kmers.py

Installation

Usage

Generating k-mer distributions

Analyzing distributions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages