A systematic benchmark of high-accuracy PacBio long-read RNA sequencing for transcript-level quantification

Abstract

Background: The assembly of fragmented RNA-sequencing reads into complete transcripts is error-prone, particularly for genes with complex splicing, resulting in ambiguity in transcript discovery and quantification. PacBio long-read RNA sequencing resolves transcripts with greater clarity than short-read technologies. PacBio Kinnex employs a cDNA concatenation approach that increases read yield on average by 8-fold relative to previous protocols. However, its quantitative performance remains under-evaluated at scale.

Results: Here, we benchmark the high-throughput PacBio Kinnex platform against Illumina short-read RNA-seq using matched, deeply sequenced datasets across a time course of endothelial cell differentiation. Compared to Illumina, Kinnex achieves comparable gene-level quantification and more accurate transcript discovery and transcript quantification. While Illumina detects more transcripts overall, many reflect potentially unstable or ambiguous estimates in complex genes. Kinnex largely avoids these issues, producing more reliable differential transcript expression calls, despite a mild bias against short transcripts (shorter than 1.25 kb). When correcting Illumina for inferential variability, Kinnex and Illumina quantifications are highly concordant, demonstrating equivalent performance. We also benchmark long-read tools, nominating Oarfish as the most efficient for our Kinnex data.

Conclusions: Together, our results establish Kinnex as a reliable platform for full-length transcript quantification.

Reproducibility

To reproduce our results, you need to have snakemake>=7.30.1, singularity-ce or apptainer and a recent conda (or alternative frontend, such as mamba) version installed. Starting from the raw data (which is controlled access, see Data availibility) you can easily reproduce all of our analyses, including the tables and figures contained in the paper by running Snakemake:

snakemake --use-conda --use-apptainer --cores 12

Setting up raw data

Place the raw sequencing data in the following directory structure, with Kinnex and Illumina at the same level under data/seq/, relative to the main project directory.

Kinnex (PacBio) data

Place the FLNC BAM file for each sample at:

data/seq/kinnex/raw/{sample}/flnc.bam

Illumina data

Place the paired-end FASTQ files for each sample at:

data/seq/illumina/raw/FASTQ/{sample}-r1.fastq.gz
data/seq/illumina/raw/FASTQ/{sample}-r2.fastq.gz

Example directory layout

data/
└── seq/
    ├── kinnex/
    │   └── raw/
    │       ├── day0-rep1/
    │       │   └── flnc.bam
    │       ├── day0-rep2/
    │       │   └── flnc.bam
    │       └── ...
    └── illumina/
        └── raw/
            └── FASTQ/
                ├── day0-rep1-r1.fastq.gz
                ├── day0-rep1-r2.fastq.gz
                ├── day0-rep2-r1.fastq.gz
                ├── day0-rep2-r2.fastq.gz
                └── ...

Data availability

Raw data

Raw data is controlled access. For access to the raw sequencing files for both Illumina and Kinnex, please refer to iTHRIV.

Intermediate data (e.g., count tables, discovered transcriptome, ...)

Intermediate data are available from Zenodo.

Figures and tables

Figures and tables are available from Zenodo.

Code

In addition to here on Github, a copy of this repo is also available from Zenodo.

Citation

Our manuscript is still under review. For now, you may find a BibTex entry for our preprint below.

@article{wissel2025systematic,
  title={A Systematic Benchmark of High-Accuracy PacBio Long-Read RNA Sequencing for Transcript-Level Quantification},
  author={Wissel, David and Mehlferber, Madison M and Nguyen, Khue M and Pavelko, Vasilii and Tseng, Elizabeth and Robinson, Mark D and Sheynkman, Gloria M},
  journal={bioRxiv},
  pages={2025--12},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}

Contact

In case of any questions, please reach out to Madison and David or open an issue in this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
workflow		workflow
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A systematic benchmark of high-accuracy PacBio long-read RNA sequencing for transcript-level quantification

Abstract

Reproducibility

Setting up raw data

Kinnex (PacBio) data

Illumina data

Example directory layout

Data availability

Raw data

Intermediate data (e.g., count tables, discovered transcriptome, ...)

Figures and tables

Code

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

License

dnwissel/kinnex-data

Folders and files

Latest commit

History

Repository files navigation

A systematic benchmark of high-accuracy PacBio long-read RNA sequencing for transcript-level quantification

Abstract

Reproducibility

Setting up raw data

Kinnex (PacBio) data

Illumina data

Example directory layout

Data availability

Raw data

Intermediate data (e.g., count tables, discovered transcriptome, ...)

Figures and tables

Code

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages