From 74129d5063653f8b2b36f5f82bb1a494e2656c9b Mon Sep 17 00:00:00 2001 From: Lokesh9106 <2400040120@kluniversity.in> Date: Sun, 23 Nov 2025 20:04:34 +0530 Subject: [PATCH] docs: Add comprehensive Rarefaction section to transformation chapter Added new section 12.3 Rarefaction to address issue #823. Changes include: - Introduction to rarefaction with rarefyAssay() and niter parameter - Subsection on using rarefaction with alpha diversity (addAlpha) - Subsection on using rarefaction with beta diversity (addMDS) - Function comparison explaining differences between: * addAlpha() vs getAlpha() * runMDS() vs addMDS() Includes practical code examples demonstrating iterative rarefaction with niter=100. --- inst/pages/transformation.qmd | 63 +++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/inst/pages/transformation.qmd b/inst/pages/transformation.qmd index c39fc546..b8330e6b 100644 --- a/inst/pages/transformation.qmd +++ b/inst/pages/transformation.qmd @@ -141,6 +141,69 @@ than the minimum abundance value before transformation. Some tools, like values. See [@sec-differential-abundance]. ::: +## Rarefaction {#sec-rarefaction} + +Another approach to control uneven sampling depths is to apply rarefaction with `rarefyAssay()`, which resamples the samples to an equal number of reads. This remains controversial, however, and strategies to mitigate the information loss in rarefaction have been proposed [@Schloss_2024a; @Schloss_2024b]. Moreover, this practice has been discouraged for the analysis of differentially abundant microorganisms [@McMurdie_and_Holmes_2014]. + +Rarefaction can be performed iteratively by using the `niter` parameter in `rarefyAssay()`. This creates multiple rarefied versions of the data, which can help account for the stochasticity introduced by random subsampling. The resulting rarefied assays can then be used for downstream analyses such as alpha and beta diversity calculations. + +### Using rarefaction with alpha diversity + +When calculating alpha diversity indices, you can apply rarefaction iteratively and then compute diversity metrics across the rarefied replicates. The `addAlpha()` function can work with rarefied data: + +```{r} +#| label: rarefaction-alpha +#| eval: false + +# Load example data +library(mia) +data("Tengeler2020") +tse <- Tengeler2020 + +# Get minimum read depth for rarefaction +min_reads <- min(colSums(assay(tse, "counts"))) + +# Perform iterative rarefaction +tse <- rarefyAssay( + tse, + method = "subsample", + sample = min_reads, + niter = 100 +) + +# Calculate alpha diversity on rarefied data +tse <- addAlpha( + tse, + assay_name = "counts_rarefied", + sample = min_reads, + niter = 100 +) +``` + +### Using rarefaction with beta diversity + +Similarly, rarefaction can be applied before calculating beta diversity and performing ordination. The `addMDS()` function can utilize rarefied data for more robust distance calculations: + +```{r} +#| label: rarefaction-beta +#| eval: false + +# Perform MDS ordination on rarefied data +tse <- addMDS( + tse, + assay_name = "counts_rarefied", + method = "bray", + niter = 100 +) +``` + +### Function comparison + +**`addAlpha()` vs `getAlpha()`**: Both functions calculate alpha diversity indices, but `addAlpha()` stores the results directly into the `colData` of the TreeSummarizedExperiment object, while `getAlpha()` returns the diversity values as a separate vector or matrix. Use `addAlpha()` when you want to keep all data together in one object, and `getAlpha()` when you need the diversity values for immediate use in other calculations. + +**`runMDS()` vs `addMDS()`**: The `runMDS()` function calculates multidimensional scaling coordinates and returns them as a separate matrix, whereas `addMDS()` calculates the MDS coordinates and stores them directly into the `reducedDim` slot of the TreeSummarizedExperiment object. Using `addMDS()` is generally preferred as it maintains all results within the same data object, making downstream analyses and visualization more straightforward. + + ## Transformations in practice Below, we apply relative transformation to counts table.