MULTI SEQuence EXtractor — a fast, parallel CLI tool for extracting multiple
sequences from FASTA files using .fai indexing.
Similar to samtools faidx but optimised for bulk extraction by leveraging
multiple CPU cores and flexible batch input formats (CSV/TSV tables with named
columns, list files, and inline regions).
Usage: multiseqex [OPTIONS] <FASTA>
Arguments:
<FASTA> Reference FASTA file (bgzipped ok if a matching .fai exists)
Options:
--regions <REGIONS> Comma-separated regions: chr:start-end, ...
--list <LIST> File with one region per line (chr:start-end)
--table <TABLE> CSV/TSV table with named columns (see below)
--sv-table <SV_TABLE> CSV/TSV SV table with named columns (see below)
--flank <FLANK> Flank size for position-mode tables
-o, --output <OUTPUT> Output FASTA file (default: stdout)
--output-dir <OUTPUT_DIR> Output directory (one file per region/SV pair)
--threads <THREADS> Number of worker threads (default: all CPUs)
--no-build-fai Error if .fai is missing instead of building it
-h, --help Print help
-V, --version Print version
# Single region to stdout
multiseqex ref.fa --regions chr1:1000-2000
# Multiple regions to a file
multiseqex ref.fa --regions chr1:1000-2000,chr2:3000-4000 -o out.fa
# From a CSV table (range mode)
multiseqex ref.fa --table regions.csv -o out.fa
# From a CSV table (position mode with flanking)
multiseqex ref.fa --table positions.csv --flank 500 -o out.fa
# SV breakpoints to per-pair files
multiseqex ref.fa --sv-table variants.tsv --output-dir sv_seqs/
# One file per region
multiseqex ref.fa --table regions.csv --output-dir per_region/Tables must have a header row with named columns. Column names are
case-insensitive and can appear in any order. Extra columns (e.g. GENE,
STRAND) are silently ignored.
| Column | Required? | Description |
|---|---|---|
CHROM |
Yes | Chromosome / contig name |
START |
Yes (range mode) | 1-based inclusive start position |
END |
Yes (range mode) | 1-based inclusive end position |
POS |
Yes (position mode, needs --flank) | Single coordinate position |
NAME |
No | Region label for output naming |
- Range mode: provide
CHROM,START,END. - Position mode: provide
CHROM,POSand pass--flank.
Each row produces two regions (left and right breakpoints).
| Column | Required? | Description |
|---|---|---|
CHROM_LEFT |
Yes | Left breakpoint chromosome |
START_LEFT |
Yes (range mode) | Left breakpoint start |
END_LEFT |
Yes (range mode) | Left breakpoint end |
POS_LEFT |
Yes (position mode, needs --flank) | Left breakpoint position |
CHROM_RIGHT |
Yes | Right breakpoint chromosome |
START_RIGHT |
Yes (range mode) | Right breakpoint start |
END_RIGHT |
Yes (range mode) | Right breakpoint end |
POS_RIGHT |
Yes (position mode, needs --flank) | Right breakpoint position |
NAME |
No | SV identifier for naming |
cargo install multiseqexcargo install --git https://github.com/trentzz/multiseqexgit clone https://github.com/trentzz/multiseqex.git
cd multiseqex
cargo build --release
cp target/release/multiseqex ~/.local/bin/- Rust 1.85+ and Cargo (edition 2024)
- samtools (optional — for
pre-building
.faiindexes)
If the FASTA file lacks a .fai index, multiseqex builds one automatically
(unless --no-build-fai is set).
See the docs/ folder for detailed guides:
- Usage guide — full walkthrough of all input and output modes
- Testing and benchmarking — how to run tests and measure performance
MIT