Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions inst/pages/transformation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,69 @@ than the minimum abundance value before transformation. Some tools, like
values. See [@sec-differential-abundance].
:::

## Rarefaction {#sec-rarefaction}

Another approach to control uneven sampling depths is to apply rarefaction with `rarefyAssay()`, which resamples the samples to an equal number of reads. This remains controversial, however, and strategies to mitigate the information loss in rarefaction have been proposed [@Schloss_2024a; @Schloss_2024b]. Moreover, this practice has been discouraged for the analysis of differentially abundant microorganisms [@McMurdie_and_Holmes_2014].

Rarefaction can be performed iteratively by using the `niter` parameter in `rarefyAssay()`. This creates multiple rarefied versions of the data, which can help account for the stochasticity introduced by random subsampling. The resulting rarefied assays can then be used for downstream analyses such as alpha and beta diversity calculations.

### Using rarefaction with alpha diversity

When calculating alpha diversity indices, you can apply rarefaction iteratively and then compute diversity metrics across the rarefied replicates. The `addAlpha()` function can work with rarefied data:

```{r}
#| label: rarefaction-alpha
#| eval: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not eval: true?


# Load example data
library(mia)
data("Tengeler2020")
tse <- Tengeler2020

# Get minimum read depth for rarefaction
min_reads <- min(colSums(assay(tse, "counts")))

# Perform iterative rarefaction
tse <- rarefyAssay(
tse,
method = "subsample",
sample = min_reads,
niter = 100
)

# Calculate alpha diversity on rarefied data
tse <- addAlpha(
tse,
assay_name = "counts_rarefied",
sample = min_reads,
niter = 100
)
Comment on lines +166 to +180
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addAlpha can be used independently of rarefyAssay.

Hence I am thinking that it might be more clear to show these as two separate operations that can both be feasible but each on their own right. Shall we split this chunk in two parts?

```

### Using rarefaction with beta diversity

Similarly, rarefaction can be applied before calculating beta diversity and performing ordination. The `addMDS()` function can utilize rarefied data for more robust distance calculations:

```{r}
#| label: rarefaction-beta
#| eval: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not eval: true?


# Perform MDS ordination on rarefied data
tse <- addMDS(
tse,
assay_name = "counts_rarefied",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mia changed argument names last year; "assay_name" is deprecated and should be replaced with "assay.type" everywhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If "niter" parameter is used then doesn't it already take care or rarifification i.e. why not use assay.type="counts"?

method = "bray",
niter = 100
)
```

### Function comparison

**`addAlpha()` vs `getAlpha()`**: Both functions calculate alpha diversity indices, but `addAlpha()` stores the results directly into the `colData` of the TreeSummarizedExperiment object, while `getAlpha()` returns the diversity values as a separate vector or matrix. Use `addAlpha()` when you want to keep all data together in one object, and `getAlpha()` when you need the diversity values for immediate use in other calculations.

Comment on lines +202 to +203
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to explain this earlier, where the rarified alpha diversity analysis is shown.

**`runMDS()` vs `addMDS()`**: The `runMDS()` function calculates multidimensional scaling coordinates and returns them as a separate matrix, whereas `addMDS()` calculates the MDS coordinates and stores them directly into the `reducedDim` slot of the TreeSummarizedExperiment object. Using `addMDS()` is generally preferred as it maintains all results within the same data object, making downstream analyses and visualization more straightforward.

Comment on lines +204 to +205
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also comment whether this is available for other ordination functions e.g. runPCA, runNMDS..?


## Transformations in practice

Below, we apply relative transformation to counts table.
Expand Down
Loading