scAmbi — Mapping-ambiguity overdispersion correction for scRNA-seq

scAmbi provides tools to estimate and correct overdispersion arising from read-to-transcript mapping ambiguity (e.g., Salmon Alevin cell-level bootstraps), then evaluate effects on variability using within- and between-sample BCV analyses. It includes helpers to build corrected Seurat objects and plots for quick diagnostics.

Why scAmbi?

Multi-mapping fragments and assignment ambiguity inflate inferential variance. Alevin can emit per-cell bootstrap replicates that capture this. scAmbi:

computes per-gene OD from bootstraps (sparse-aware, block-wise),
integrates bootstrap-, moments-, and prior-based estimates for exploration,
constructs a corrected assay (counts scaled by 1/OD),
quantifies improvement via edgeR BCV (within/between sample),
offers plotting + summaries for rapid QC.

Key features

Bootstrap OD (sparse-aware): compute_overdisp_sparse_aware()
Integrated OD estimator: estimate_overdispersion_integrated()
Seurat integration: process_and_create_seurat_corrected_improved()
Within-sample BCV: calculate_within_sample_bcv(), analyze_within_sample_bcv()
Between-sample BCV: extract_and_pseudobulk(), calculate_bcv_direct()
Visualization: plot_within_sample_bcv(), plot_within_sample_summary(), plot_bcv(), plot_bcv_comparison()
Utilities: read_eds_gc(), read_sample_data_improved(), set_feature_metadata(), extract_feature_vector()

Dependencies

Required R packages

This package requires R version 4.2 or higher.

Core dependencies

Seurat (>= 4.0.0): Single-cell data structures and workflows
Matrix (>= 1.3-0): Sparse matrix operations
edgeR (>= 3.34.0): Dispersion estimation and BCV calculations
Rcpp (>= 1.0.7): C++ integration for performance-critical operations

Data I/O and processing

eds (>= 1.0.0): Reading Alevin EDS sparse matrix format
tximport (>= 1.20.0): Transcript-level quantification import
rtracklayer (>= 1.52.0): GTF file parsing for gene complexity calculations
jsonlite (>= 1.7.0): JSON data handling

Visualization

ggplot2 (>= 3.3.0): Core plotting framework
patchwork (>= 1.1.0): Combining multiple plots
dplyr (>= 1.0.0): Data manipulation for summaries
tidyr (>= 1.1.0): Data reshaping for visualization

Parallel processing

parallel: Built-in R package for multi-core processing

Development

These packages are necessary for building the vignettes and running tests. You can install them by running the following command:

install.packages(c("knitr", "rmarkdown", "testthat"))

System requirements

C++ compiler: Required for compiling Rcpp functions (e.g., g++ >= 7.0 or clang >= 4.0)
OpenMP (optional): For additional parallelization in C++ code
Memory: Minimum 64GB RAM recommended for typical datasets (4-8 samples, ~1M cells each)
Storage: Sufficient space for Alevin bootstrap files (can be several GB per sample)

Installation

From GitHub (recommended)

# install.packages("remotes")
remotes::install_github("sbresnahan/scAmbi")

From a local source tarball/zip

# If you have scAmbi.zip or a source tar.gz:
install.packages("scAmbi.zip", repos = NULL, type = "source")
# or use devtools:
# devtools::install("path/to/scAmbi/")

Getting started

library(scAmbi)

# 1) Estimate integrated overdispersion from one Alevin sample
#    (requires bootstraps in <sample>/alevin/quants_boot_mat.gz)
alevin_dir <- "path/to/<sample>/alevin"
# counts <- ... (genes x cells dgCMatrix), feats <- rownames(counts), cells <- colnames(counts)
od <- estimate_overdispersion_integrated(
  counts     = counts,
  alevin_dir = alevin_dir,
  n_boot     = 20,
  n_cores    = 4
)

# 2) Build a Seurat object with corrected assay (RNA_corr = counts / OD)
seu <- process_and_create_seurat_corrected_improved(
  sample_id = "S1", counts = counts, od = od, feats = feats, cells = cells
)

# 3) Within-sample BCV comparison (raw vs corrected)
wres <- analyze_within_sample_bcv(list(S1 = seu), assay_names = c("RNA", "RNA_corr"), n_groups = 10)
p <- plot_within_sample_bcv(wres, sample_name = "S1")
print(p)

See the vignette for a full, reproducible walkthrough.

Alevin settings (inputs expected)

To use bootstrap-based OD, run Alevin with cell-level bootstraps enabled so that alevin/quants_boot_mat.gz is present:

salmon alevin \
  -l ISR \
  -1 example_1.fastq.gz \
  -2 example_2.fastq.gz \
  --chromiumV3 \
  -i index/transcripts \
  -p 10 \
  --whitelist index/3M-february-2018.txt \
  --numCellBootstraps 20 \
  --dumpFeatures \
  -o quants/example \
  --tgMap index/tx2g.tsv  # tx-to-tx identity table

Notes:

scAmbi reads the boot matrix and associated index files via eds::readEDS().
For transcript-centric work, provide a suitable index/mapping to Alevin.

Vignette

# After install:
browseVignettes("scAmbi")
# Or build from source:
devtools::build_vignettes(); browseVignettes("scAmbi")

The vignette demonstrates OD estimation, Seurat correction, and BCV diagnostics end-to-end.

License

GPL-3.
Maintainer: Sean T. Bresnahan stbresnahan@mdanderson.org.

Citation

If you use scAmbi, please cite this repository and the tools it builds upon (e.g., Salmon/Alevin, edgeR). A formal citation will be added once a preprint is available.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.Rproj.user		.Rproj.user
.github/workflows		.github/workflows
Meta		Meta
R		R
doc		doc
man		man
src		src
tests		tests
vignettes		vignettes
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml
index.html		index.html
scAmbi.Rproj		scAmbi.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scAmbi — Mapping-ambiguity overdispersion correction for scRNA-seq

Why scAmbi?

Key features

Dependencies

Required R packages

Core dependencies

Data I/O and processing

Visualization

Parallel processing

Development

System requirements

Installation

From GitHub (recommended)

From a local source tarball/zip

Getting started

Alevin settings (inputs expected)

Vignette

License

Citation

About

Uh oh!

Releases 3

Packages

Languages

sbresnahan/scAmbi

Folders and files

Latest commit

History

Repository files navigation

scAmbi — Mapping-ambiguity overdispersion correction for scRNA-seq

Why scAmbi?

Key features

Dependencies

Required R packages

Core dependencies

Data I/O and processing

Visualization

Parallel processing

Development

System requirements

Installation

From GitHub (recommended)

From a local source tarball/zip

Getting started

Alevin settings (inputs expected)

Vignette

License

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages