Impaqt

Currently performs transcript identification, gene assignment, and naive quantification. A more sophisticated quantification method remain on the TO-DO list.

Introduction

IMPAQT (Identifies Multiple Peaks and Quantifies Transcripts) is a transcript identification and gene expression quantification method for TAGseq and 3' mRNAseq experiments. It operates on assumptions about the distribution of reads along the 3' UTR of expressed genes. Clustering these reads enables pseudo-transcript identification and quantification of expression at the gene and transcript level for isoforms utilizing distinct 3' ends.

It generates a GTF file defining the boundaries of each transcript and their expression level as well as, optionally, a gene expression counts table if a reference annotation is provided.

This method is particularly useful in non-model organisms where 3' UTRs for most genes are poorly annotated (resulting in massive data loss). Increased gene density also tends to hurt the assignment of transcripts by this aglorithm, as it increases assignment ambiguity. Reads for which a reasonable transcript of origin cannot be identified are handled individually.

Installation

Make sure cmake and make are installed on your machine.

# Linux
sudo apt install cmake zlib1g-dev

# Mac
brew install cmake zlib

Clone this repository and change into it.

git clone https://github.com/bnjenner/impaqt.git
cd impaqt

Create a build directory and change into it.

mkdir build
cd build

Compile

cmake ../
make

Install

sudo make install

Give it a go!

impaqt input.sorted.bam

Usage

SYNOPSIS
    impaqt input.sorted.bam [options]

DESCRIPTION
    Identifies Multiple Peaks and Qauntifies Transcripts. Identifies and quantifies isoforms utilizing distinct 3'
    ends. Generates a GTF file of identified transcripts and optionally a counts file written to stdout if a reference
    annotation is provided.

REQUIRED ARGUMENTS
    BAM INPUT_FILE

OPTIONS
    -h, --help
          Display the help message.
    -t, --threads INTEGER
          Number of processers for multithreading. Default: 1.
    -a, --annotation INPUT_FILE
          Annotation File (GTF or GFF). If specified, a counts table will be output through standard out. NOTICE: File
          type identified by file extension. Default: .
    -s, --strandedness STRING
          Strandedness of library. One of forward and reverse. Default: forward.
    -n, --nonunique-alignments
          Count primary and secondary read alignments.
    -q, --mapq-min INTEGER
          Minimum mapping quality score to consider. Default: 1.
    -w, --window-size INTEGER
          Window size to use to parition genome for read collection. Default: 1000.
    -m, --min-count INTEGER
          Minimum read count to initiate DBSCAN transcript identification algorithm. (Hard minimum of 10) Default: 25.
    -p, --count-percentage INTEGER
          Minimum read count percentage for identifying core reads in DBSCAN algorithm. This will be the threshold
          unless number of reads is less than 10. Default: 5.
    -e, --epsilon INTEGER
          Distance (in base pairs) for neighboring reads in DBSCAN algorithm. This should generally be 0.5-1.5x the
          read length, depending on desired isoform sensitivity (lower = more sensitive). Default: 50.
    -d, --density-threshold DOUBLE
          Read density threshold (# reads / # bps) to skip transcript identification. Assignment in super dense
          regions (usually the mitochrondria) doesn't really benefit from transcript identificaiton. Default is unset.
          Default: 0.
    -f, --feature-tag STRING
          Name of feature in GTF for assignment. Default: exon.
     -u, --utr-tag STRING
          Name of UTR feature in GTF for assignment. Default: UTR.
    -i, --feature-id STRING
          ID of feature to use for feature assignment. Default: gene_id.
    -o, --output-gtf STRING
          Specify name of cluster GTF file. Default is BAM name + ".gtf".
    --version
          Display version information.

VERSION
    Last update: August 2025
    impaqt version: beta
    SeqAn version: 2.4.0

Dependencies

Utilizes libraries like bamtools and seqan and the DBSCAN algorithm is inspired by EmbeddedArtistry and github user Eleobert.

Contact

For questions or comments, please contact Bradley Jenner at bnjenner@bu.edu

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
cmake		cmake
docs		docs
ext		ext
include		include
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Impaqt

Introduction

Installation

Usage

Dependencies

Contact

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Impaqt

Introduction

Installation

Usage

Dependencies

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages