A systematic benchmark of high-accuracy PacBio long-read RNA sequencing for transcript-level quantification
Background: The assembly of fragmented RNA-sequencing reads into complete transcripts is error-prone, particularly for genes with complex splicing, resulting in ambiguity in transcript discovery and quantification. PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies. PacBio Kinnex employs a cDNA concatenation approach that increases read yield on average by 8-fold relative to previous protocols. However, its quantitative performance remains under-evaluated at scale.
Results: Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex achieves comparable gene-level quantification and more accurate transcript discovery and transcript quantification. While Illumina detects more transcripts overall, many reflect potentially unstable or ambiguous estimates in complex genes. Kinnex largely avoids these issues, producing more reliable differential transcript expression calls, despite a mild bias against short transcripts (shorter than 1.25 kb). When correcting Illumina for inferential variability, Kinnex and Illumina quantifications are highly concordant, demonstrating equivalent performance. We also benchmark long-read tools, nominating Oarfish as the most efficient for our Kinnex data.
Conclusions: Together, our results establish Kinnex as a reliable platform for full-length transcript quantification.
To reproduce our results, you need to have snakemake>=7.30.1, singularity-ce or apptainer and a recent conda (or alternative frontend, such as mamba) version installed. Starting from the raw data (which is controlled access, see Data availibility) you can easily reproduce all of our analyses, including the tables and figures contained in the paper by running Snakemake:
snakemake --use-conda --use-apptainer --cores 12Place the raw sequencing data in the following directory structure, with Kinnex and Illumina at the same level under data/seq/, relative to the main project directory.
Place the FLNC BAM file for each sample at:
data/seq/kinnex/raw/{sample}/flnc.bam
Place the paired-end FASTQ files for each sample at:
data/seq/illumina/raw/FASTQ/{sample}-r1.fastq.gz
data/seq/illumina/raw/FASTQ/{sample}-r2.fastq.gz
data/
└── seq/
├── kinnex/
│ └── raw/
│ ├── day0-rep1/
│ │ └── flnc.bam
│ ├── day0-rep2/
│ │ └── flnc.bam
│ └── ...
└── illumina/
└── raw/
└── FASTQ/
├── day0-rep1-r1.fastq.gz
├── day0-rep1-r2.fastq.gz
├── day0-rep2-r1.fastq.gz
├── day0-rep2-r2.fastq.gz
└── ...
Raw data is controlled access. For access to the raw sequencing files for both Illumina and Kinnex, please refer to iTHRIV.
Intermediate data are available from Zenodo.
Figures and tables are available from Zenodo.
In addition to here on Github, a copy of this repo is also available from Zenodo.
Our manuscript is still under review. For now, you may find a BibTex entry for our preprint below.
@article{wissel2025systematic,
title={A Systematic Benchmark of High-Accuracy PacBio Long-Read RNA Sequencing for Transcript-Level Quantification},
author={Wissel, David and Mehlferber, Madison M and Nguyen, Khue M and Pavelko, Vasilii and Tseng, Elizabeth and Robinson, Mark D and Sheynkman, Gloria M},
journal={bioRxiv},
pages={2025--12},
year={2025},
publisher={Cold Spring Harbor Laboratory}
}In case of any questions, please reach out to Madison and David or open an issue in this repo.