scAmbi provides tools to estimate and correct overdispersion arising from read-to-transcript mapping ambiguity (e.g., Salmon Alevin cell-level bootstraps), then evaluate effects on variability using within- and between-sample BCV analyses. It includes helpers to build corrected Seurat objects and plots for quick diagnostics.
Multi-mapping fragments and assignment ambiguity inflate inferential variance. Alevin can emit per-cell bootstrap replicates that capture this. scAmbi:
- computes per-gene OD from bootstraps (sparse-aware, block-wise),
- integrates bootstrap-, moments-, and prior-based estimates for exploration,
- constructs a corrected assay (counts scaled by 1/OD),
- quantifies improvement via edgeR BCV (within/between sample),
- offers plotting + summaries for rapid QC.
- Bootstrap OD (sparse-aware):
compute_overdisp_sparse_aware() - Integrated OD estimator:
estimate_overdispersion_integrated() - Seurat integration:
process_and_create_seurat_corrected_improved() - Within-sample BCV:
calculate_within_sample_bcv(),analyze_within_sample_bcv() - Between-sample BCV:
extract_and_pseudobulk(),calculate_bcv_direct() - Visualization:
plot_within_sample_bcv(),plot_within_sample_summary(),plot_bcv(),plot_bcv_comparison() - Utilities:
read_eds_gc(),read_sample_data_improved(),set_feature_metadata(),extract_feature_vector()
This package requires R version 4.2 or higher.
- Seurat (>= 4.0.0): Single-cell data structures and workflows
- Matrix (>= 1.3-0): Sparse matrix operations
- edgeR (>= 3.34.0): Dispersion estimation and BCV calculations
- Rcpp (>= 1.0.7): C++ integration for performance-critical operations
- eds (>= 1.0.0): Reading Alevin EDS sparse matrix format
- tximport (>= 1.20.0): Transcript-level quantification import
- rtracklayer (>= 1.52.0): GTF file parsing for gene complexity calculations
- jsonlite (>= 1.7.0): JSON data handling
- ggplot2 (>= 3.3.0): Core plotting framework
- patchwork (>= 1.1.0): Combining multiple plots
- dplyr (>= 1.0.0): Data manipulation for summaries
- tidyr (>= 1.1.0): Data reshaping for visualization
- parallel: Built-in R package for multi-core processing
These packages are necessary for building the vignettes and running tests. You can install them by running the following command:
install.packages(c("knitr", "rmarkdown", "testthat"))- C++ compiler: Required for compiling Rcpp functions (e.g., g++ >= 7.0 or clang >= 4.0)
- OpenMP (optional): For additional parallelization in C++ code
- Memory: Minimum 64GB RAM recommended for typical datasets (4-8 samples, ~1M cells each)
- Storage: Sufficient space for Alevin bootstrap files (can be several GB per sample)
# install.packages("remotes")
remotes::install_github("sbresnahan/scAmbi")# If you have scAmbi.zip or a source tar.gz:
install.packages("scAmbi.zip", repos = NULL, type = "source")
# or use devtools:
# devtools::install("path/to/scAmbi/")library(scAmbi)
# 1) Estimate integrated overdispersion from one Alevin sample
# (requires bootstraps in <sample>/alevin/quants_boot_mat.gz)
alevin_dir <- "path/to/<sample>/alevin"
# counts <- ... (genes x cells dgCMatrix), feats <- rownames(counts), cells <- colnames(counts)
od <- estimate_overdispersion_integrated(
counts = counts,
alevin_dir = alevin_dir,
n_boot = 20,
n_cores = 4
)
# 2) Build a Seurat object with corrected assay (RNA_corr = counts / OD)
seu <- process_and_create_seurat_corrected_improved(
sample_id = "S1", counts = counts, od = od, feats = feats, cells = cells
)
# 3) Within-sample BCV comparison (raw vs corrected)
wres <- analyze_within_sample_bcv(list(S1 = seu), assay_names = c("RNA", "RNA_corr"), n_groups = 10)
p <- plot_within_sample_bcv(wres, sample_name = "S1")
print(p)See the vignette for a full, reproducible walkthrough.
To use bootstrap-based OD, run Alevin with cell-level bootstraps enabled so that alevin/quants_boot_mat.gz is present:
salmon alevin \
-l ISR \
-1 example_1.fastq.gz \
-2 example_2.fastq.gz \
--chromiumV3 \
-i index/transcripts \
-p 10 \
--whitelist index/3M-february-2018.txt \
--numCellBootstraps 20 \
--dumpFeatures \
-o quants/example \
--tgMap index/tx2g.tsv # tx-to-tx identity tableNotes:
- scAmbi reads the boot matrix and associated index files via
eds::readEDS(). - For transcript-centric work, provide a suitable index/mapping to Alevin.
# After install:
browseVignettes("scAmbi")
# Or build from source:
devtools::build_vignettes(); browseVignettes("scAmbi")The vignette demonstrates OD estimation, Seurat correction, and BCV diagnostics end-to-end.
GPL-3.
Maintainer: Sean T. Bresnahan stbresnahan@mdanderson.org.
If you use scAmbi, please cite this repository and the tools it builds upon (e.g., Salmon/Alevin, edgeR). A formal citation will be added once a preprint is available.