-
Notifications
You must be signed in to change notification settings - Fork 53
docs: Add comprehensive Rarefaction section to transformation chapter #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -141,6 +141,69 @@ than the minimum abundance value before transformation. Some tools, like | |
| values. See [@sec-differential-abundance]. | ||
| ::: | ||
|
|
||
| ## Rarefaction {#sec-rarefaction} | ||
|
|
||
| Another approach to control uneven sampling depths is to apply rarefaction with `rarefyAssay()`, which resamples the samples to an equal number of reads. This remains controversial, however, and strategies to mitigate the information loss in rarefaction have been proposed [@Schloss_2024a; @Schloss_2024b]. Moreover, this practice has been discouraged for the analysis of differentially abundant microorganisms [@McMurdie_and_Holmes_2014]. | ||
|
|
||
| Rarefaction can be performed iteratively by using the `niter` parameter in `rarefyAssay()`. This creates multiple rarefied versions of the data, which can help account for the stochasticity introduced by random subsampling. The resulting rarefied assays can then be used for downstream analyses such as alpha and beta diversity calculations. | ||
|
|
||
| ### Using rarefaction with alpha diversity | ||
|
|
||
| When calculating alpha diversity indices, you can apply rarefaction iteratively and then compute diversity metrics across the rarefied replicates. The `addAlpha()` function can work with rarefied data: | ||
|
|
||
| ```{r} | ||
| #| label: rarefaction-alpha | ||
| #| eval: false | ||
|
|
||
| # Load example data | ||
| library(mia) | ||
| data("Tengeler2020") | ||
| tse <- Tengeler2020 | ||
|
|
||
| # Get minimum read depth for rarefaction | ||
| min_reads <- min(colSums(assay(tse, "counts"))) | ||
|
|
||
| # Perform iterative rarefaction | ||
| tse <- rarefyAssay( | ||
| tse, | ||
| method = "subsample", | ||
| sample = min_reads, | ||
| niter = 100 | ||
| ) | ||
|
|
||
| # Calculate alpha diversity on rarefied data | ||
| tse <- addAlpha( | ||
| tse, | ||
| assay_name = "counts_rarefied", | ||
| sample = min_reads, | ||
| niter = 100 | ||
| ) | ||
|
Comment on lines
+166
to
+180
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hence I am thinking that it might be more clear to show these as two separate operations that can both be feasible but each on their own right. Shall we split this chunk in two parts? |
||
| ``` | ||
|
|
||
| ### Using rarefaction with beta diversity | ||
|
|
||
| Similarly, rarefaction can be applied before calculating beta diversity and performing ordination. The `addMDS()` function can utilize rarefied data for more robust distance calculations: | ||
|
|
||
| ```{r} | ||
| #| label: rarefaction-beta | ||
| #| eval: false | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not eval: true? |
||
|
|
||
| # Perform MDS ordination on rarefied data | ||
| tse <- addMDS( | ||
| tse, | ||
| assay_name = "counts_rarefied", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mia changed argument names last year; "assay_name" is deprecated and should be replaced with "assay.type" everywhere
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If "niter" parameter is used then doesn't it already take care or rarifification i.e. why not use assay.type="counts"? |
||
| method = "bray", | ||
| niter = 100 | ||
| ) | ||
| ``` | ||
|
|
||
| ### Function comparison | ||
|
|
||
| **`addAlpha()` vs `getAlpha()`**: Both functions calculate alpha diversity indices, but `addAlpha()` stores the results directly into the `colData` of the TreeSummarizedExperiment object, while `getAlpha()` returns the diversity values as a separate vector or matrix. Use `addAlpha()` when you want to keep all data together in one object, and `getAlpha()` when you need the diversity values for immediate use in other calculations. | ||
|
|
||
|
Comment on lines
+202
to
+203
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to explain this earlier, where the rarified alpha diversity analysis is shown. |
||
| **`runMDS()` vs `addMDS()`**: The `runMDS()` function calculates multidimensional scaling coordinates and returns them as a separate matrix, whereas `addMDS()` calculates the MDS coordinates and stores them directly into the `reducedDim` slot of the TreeSummarizedExperiment object. Using `addMDS()` is generally preferred as it maintains all results within the same data object, making downstream analyses and visualization more straightforward. | ||
|
|
||
|
Comment on lines
+204
to
+205
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could also comment whether this is available for other ordination functions e.g. runPCA, runNMDS..? |
||
|
|
||
| ## Transformations in practice | ||
|
|
||
| Below, we apply relative transformation to counts table. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not eval: true?