Skip to content

trentzz/multiseqex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multiseqex

MULTI SEQuence EXtractor — a fast, parallel CLI tool for extracting multiple sequences from FASTA files using .fai indexing.

Similar to samtools faidx but optimised for bulk extraction by leveraging multiple CPU cores and flexible batch input formats (CSV/TSV tables with named columns, list files, and inline regions).

Usage

Usage: multiseqex [OPTIONS] <FASTA>

Arguments:
  <FASTA>  Reference FASTA file (bgzipped ok if a matching .fai exists)

Options:
      --regions <REGIONS>        Comma-separated regions: chr:start-end, ...
      --list <LIST>              File with one region per line (chr:start-end)
      --table <TABLE>            CSV/TSV table with named columns (see below)
      --sv-table <SV_TABLE>      CSV/TSV SV table with named columns (see below)
      --flank <FLANK>            Flank size for position-mode tables
  -o, --output <OUTPUT>          Output FASTA file (default: stdout)
      --output-dir <OUTPUT_DIR>  Output directory (one file per region/SV pair)
      --threads <THREADS>        Number of worker threads (default: all CPUs)
      --no-build-fai             Error if .fai is missing instead of building it
  -h, --help                     Print help
  -V, --version                  Print version

Examples

# Single region to stdout
multiseqex ref.fa --regions chr1:1000-2000

# Multiple regions to a file
multiseqex ref.fa --regions chr1:1000-2000,chr2:3000-4000 -o out.fa

# From a CSV table (range mode)
multiseqex ref.fa --table regions.csv -o out.fa

# From a CSV table (position mode with flanking)
multiseqex ref.fa --table positions.csv --flank 500 -o out.fa

# SV breakpoints to per-pair files
multiseqex ref.fa --sv-table variants.tsv --output-dir sv_seqs/

# One file per region
multiseqex ref.fa --table regions.csv --output-dir per_region/

Table formats

Tables must have a header row with named columns. Column names are case-insensitive and can appear in any order. Extra columns (e.g. GENE, STRAND) are silently ignored.

--table

Column Required? Description
CHROM Yes Chromosome / contig name
START Yes (range mode) 1-based inclusive start position
END Yes (range mode) 1-based inclusive end position
POS Yes (position mode, needs --flank) Single coordinate position
NAME No Region label for output naming
  • Range mode: provide CHROM, START, END.
  • Position mode: provide CHROM, POS and pass --flank.

--sv-table

Each row produces two regions (left and right breakpoints).

Column Required? Description
CHROM_LEFT Yes Left breakpoint chromosome
START_LEFT Yes (range mode) Left breakpoint start
END_LEFT Yes (range mode) Left breakpoint end
POS_LEFT Yes (position mode, needs --flank) Left breakpoint position
CHROM_RIGHT Yes Right breakpoint chromosome
START_RIGHT Yes (range mode) Right breakpoint start
END_RIGHT Yes (range mode) Right breakpoint end
POS_RIGHT Yes (position mode, needs --flank) Right breakpoint position
NAME No SV identifier for naming

Installation

From crates.io

cargo install multiseqex

From GitHub

cargo install --git https://github.com/trentzz/multiseqex

Build from source

git clone https://github.com/trentzz/multiseqex.git
cd multiseqex
cargo build --release
cp target/release/multiseqex ~/.local/bin/

Prerequisites

  • Rust 1.85+ and Cargo (edition 2024)
  • samtools (optional — for pre-building .fai indexes)

If the FASTA file lacks a .fai index, multiseqex builds one automatically (unless --no-build-fai is set).

Documentation

See the docs/ folder for detailed guides:

License

MIT

About

MULTI SEQuence EXtractor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages