Skip to content

SynedraAcus/kmers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kmers.py

A pure Python3 k-mer analysis library. This library is agnostic to a type of sequences used and treats them internally as strings.

Installation

Python2 is not supported. For python3 it's best to use PyPI.

pip3 install kmers

Usage

Generating k-mer distributions

First, create a Composition object, supplying the value of k. Initial data can be supplied either through seq (see self.process) or fh, which accepts a file-like object and reads the composition data from a file. If both are omitted, an empty Composition object is created.

import kmers.kmers as kmers
composition = kmers.Composition(k=3, seq=None, fh=None)

Data can be added to an existing model later. You can add either a single SeqRecord object or an iterable of SeqRecords. There is no limit to how many data can be loaded in the single Composition, except hardware limitations. In practice 12-mer distribution of complete H. sapiens proteome, including isoforms, takes more than 10 Gb of RAM using Cython.

composition.process(seq, update=False)

Relative and logarithmic distributions are computed lazily, so it's possible that first time you access them (since adding a sequence) it will take some time. They are available as Composition attributes that support dict API:

composition.relative_distribution['CMLD']
composition.log_distribution['CMLD']

Analyzing distributions

Given sequence a_seq, you can find probability it was generated by this distribuition. Comparing such probabilities for a series of distributions is a primitive, but functional sequence classifier.

p = composition.prob(a_seq)

Given two Composition objects, you can find distance between them. Currently only n-dimensional Euclidean and feature frequency profile (Sims et al. 2008) distance metrics are supported.

e = kmers.euclidean(comp_a, comp_b)
f = kmers.ffp_distance(comp_a, comp_b)

License

This code is distributed under the terms of MIT license. Unrestricted use or modification of library is allowed provided that original author (A. A. Morozov) is properly cited. If used in scientific publication, please also cite my abstract from BGRS/SB-2016.

About

Aminoacid k-mers library in pure Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages