Sync paired-end FASTA/FASTQ files and keep singleton reads. 🚀
Pairfq is a high-performance tool designed to handle paired-end sequencing data. It provides blazing fast speed, memory safety, and efficient handling of massive datasets.
- ⚡️ Blazing Fast: Uses the Rust library
needletailfor rapid FASTX parsing. - 🗄️ Low Memory Footprint: Optional on-disk indexing with
sledallows processing of huge datasets (tens of millions of reads) with constant low memory usage. - 📦 Zero Dependencies: The main binary is self-contained (no external DB drivers needed).
- 🔧 Versatile: Handles FASTA and FASTQ formats, gzip/bzip2 compression, and interleaved or separate files.
You can download the release binaries from GitHub releases.
You need to have Rust installed via rustup, or with your package manager such as brew on macOS or apt on Linux.
# Clone the repository
git clone https://github.com/sestaton/Pairfq.git
cd Pairfq
# Build for release
cargo build --release
# Install (optional)
cp target/release/pairfq /usr/local/bin/pairfq provides a suite of subcommands to deal with paired-end FASTA/FASTQ files.
Sync paired-end reads. Matches forward and reverse reads, keeping them in sync and separating singletons.
pairfq makepairs \
-f forward.fastq -r reverse.fastq \
-fp forward_paired.fastq -rp reverse_paired.fastq \
-fs forward_unpaired.fastq -rs reverse_unpaired.fastqKey Options:
--index: Recommended for large files! Usessled(embedded DB) to index reads on disk, keeping memory usage low. 📉--stats: Print detailed statistics after processing. 📊
Interleave paired files. Combines separate forward and reverse files into a single interleaved file.
pairfq joinpairs -f forward.fastq -r reverse.fastq -o interleaved.fastqDe-interleave files. Splits a single interleaved file back into separate forward and reverse files.
pairfq splitpairs -i interleaved.fastq -f forward.fastq -r reverse.fastqCheck the integrity and pairing of forward and reverse files.
pairfq checkpairs -f <forward_reads> -r <reverse_reads>Output: A tab-delimited table showing the status of each file:
- integrity: Checks if the file can be parsed (validates gzip/bzip2 compression if applicable).
- paired: Checks if the file has the same number of records as its pair.
- paired_reads: Count of reads that are paired.
- unpaired_reads: Count of reads that are unpaired (difference in counts).
Example:
file integrity paired paired_reads unpaired_reads
file1.fq ✅ ✅ 100 0
file2.fq ✅ ✅ 100 0
Fix headers.
Adds standard pairing information (e.g., /1, /2) to read headers.
pairfq addinfo -i input.fastq -o output.fastq -p 1Want to contribute? Great!
# Run the test suite (includes ported Perl tests)
cargo testFor environments where you cannot install the Rust binary, we preserve the legacy Perl script. It has no dependencies and works with Perl 5.6+.
⚠️ Note: This version lacks the high-performance indexing of the Rust version.
Quick Install:
curl -sL git.io/pairfq_lite > pairfq_lite
chmod +x pairfq_lite
./pairfq_lite -hAlternatively, you can use this version without storing it locally.
curl -sL git.io/pairfq_lite | perl -The above command will show the options. To see a specific subcommand menu, for example the pairfq makepairs command, just type that subcommand with no options.
curl -sL git.io/pairfq_lite | perl - makepairsHowever, repeatedly running the above command with curl is not efficient. You can save it to a file and make it executable as shown above for the quick install.
Pairfq is designed to be efficient and portable. Here are some benchmark results with hyperfine comparing Pairfq to the legacy Perl implementation with a test set of 1M reads in the R1 file and 900k reads in the R2 file. For transparency, I have included the scripts to create the test data and run the benchmarks in the scripts/ directory.
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|---|---|---|---|---|
Rust (pairfq) |
2.702 ± 0.084 | 2.610 | 2.891 | 1.00 |
Perl (pairfq_lite.pl) |
9.806 ± 0.326 | 9.461 | 10.446 | 3.63 ± 0.17 |
This project is licensed under the MIT License.
Copyright (C) 2013-2025 S. Evan Staton