Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
e62391e
Update oma_packages.csv
artur-sannikov Oct 1, 2025
ed4a7f4
chore: substitute analyse and summarise
artur-sannikov Jan 15, 2026
f3fc639
substitute analyse
artur-sannikov Jan 15, 2026
7ecb9c3
substitute favour
artur-sannikov Jan 15, 2026
b73f0bd
substitute visualisation
artur-sannikov Jan 15, 2026
90d80f2
subsitute colour
artur-sannikov Jan 15, 2026
2f9e8a9
substitute utilise
artur-sannikov Jan 15, 2026
7aa956f
substitute maximize
artur-sannikov Jan 15, 2026
9c25d60
substitute programme
artur-sannikov Jan 15, 2026
9933f4d
substitute normalis
artur-sannikov Jan 15, 2026
d621ed0
substitute artefact
artur-sannikov Jan 15, 2026
83f0376
substitute label
artur-sannikov Jan 15, 2026
91d3249
substitute minimis
artur-sannikov Jan 15, 2026
f0a3c8c
substitute grey
artur-sannikov Jan 15, 2026
cbcbded
substitute behaviour
artur-sannikov Jan 15, 2026
81520f3
substitute whilst
artur-sannikov Jan 15, 2026
7aafb95
substitute additional colour
artur-sannikov Jan 15, 2026
3d29923
Merge branch 'devel' into refactor/switch-to-american-english
TuomasBorman Jan 15, 2026
e511311
Merge branch 'devel' into refactor/switch-to-american-english
TuomasBorman Jan 15, 2026
10b039d
substitute summarize
artur-sannikov Jan 15, 2026
51187d3
chore: first test of conversion script
artur-sannikov Jan 16, 2026
671d2b7
run workflow on push event
artur-sannikov Jan 16, 2026
6f3cfbc
fix: move push event to correct place
artur-sannikov Jan 16, 2026
e903a1f
remove git push
artur-sannikov Jan 16, 2026
4f4710a
add git diff
artur-sannikov Jan 16, 2026
890a18e
feat: use git-auto-commit action to push commits
artur-sannikov Jan 16, 2026
6bfc34a
remove print0 from sed
artur-sannikov Jan 16, 2026
633b0f1
remove -O from xargs
artur-sannikov Jan 16, 2026
974c01e
attempt with just run
artur-sannikov Jan 16, 2026
c59aa9b
switch favor to favour
artur-sannikov Jan 16, 2026
73a94fd
Convert American English to British Enlish
artur-sannikov Jan 16, 2026
2d4fc38
use sed command directly in the workflow file
artur-sannikov Jan 16, 2026
82fcfc2
remove conversion bash script
artur-sannikov Jan 16, 2026
384268b
Convert American English to British Enlish
artur-sannikov Jan 16, 2026
3f8e6c0
Revert "Convert American English to British Enlish"
artur-sannikov Jan 16, 2026
b4d7ae6
use gray with space after
artur-sannikov Jan 16, 2026
e7e4103
Convert American English to British Enlish
artur-sannikov Jan 16, 2026
3399e5b
Revert "Convert American English to British Enlish"
artur-sannikov Jan 16, 2026
9adb212
remove the second conversion script
artur-sannikov Jan 16, 2026
21327b9
add additional summarise replacements
artur-sannikov Jan 16, 2026
18698c4
remove github action trigger on push
artur-sannikov Jan 16, 2026
a27ca5d
Merge branch 'devel' into refactor/switch-to-american-english
TuomasBorman Jan 22, 2026
f548da7
Testing American English GHA
TuomasBorman Jan 22, 2026
0c8f50c
feat(workflows): add branch checkout to actions/checkout
artur-sannikov Jan 22, 2026
b1219ee
chore: styling of style.yaml
artur-sannikov Jan 22, 2026
23a28b6
Convert American English to British Enlish
artur-sannikov Jan 22, 2026
2c22d48
fix(workflow): checkout HEAD of the current pull request
artur-sannikov Jan 22, 2026
4def9cc
remove fetch from checkout action
artur-sannikov Jan 22, 2026
9acc2b2
remove fetch from checkout action
artur-sannikov Jan 22, 2026
82f8435
Merge branch 'devel' into refactor/switch-to-american-english
TuomasBorman Feb 9, 2026
25dcdb1
feat: submit updates to summarise* substitutions
artur-sannikov Feb 9, 2026
a86df6f
remove gha
artur-sannikov Feb 11, 2026
810bffa
chore: revert style yaml to its devel state
artur-sannikov Feb 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions inst/pages/alpha_diversity.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ evident from their names. @bastiaanssen2023bugs1 lay out this relationship
across two factors (See table below); First, alpha diversity metrics can be
defined as special cases of a unifying equation of **diversity**, where the
**Hill number** determines the specific index captured. Lower Hill numbers
favour **richness**, the number of distinct taxonomic features, whereas higher
numbers favour **evenness**, how the taxonomic features are distributed over
favor **richness**, the number of distinct taxonomic features, whereas higher
numbers favor **evenness**, how the taxonomic features are distributed over
the sample [@Hill1973]. Second, some alpha diversity metrics are weighted based
on phylogeny, like Faith's PD [-@Faith1992] and PhILR [@Silverman2017].

Expand Down Expand Up @@ -192,7 +192,7 @@ barcode).
```{r}
#| label: plot_richness
#| message: false
#| fig-cap: "Observed richness plotted grouped by sample type with colour-labelled barcode."
#| fig-cap: "Observed richness plotted grouped by sample type with color-labeled barcode."

library(scater)
plotColData(
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/clustering.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ clusters to describe the data.
Now, we visualize the hierarchical structure of the clusters with a dendrogram
tree. In dendrograms, the tree is split where the branch length is the largest.
In each splitting point, the tree is divided into two clusters leading to the
hierarchy. In this example, each sample is labelled by their dominant taxon
hierarchy. In this example, each sample is labeled by their dominant taxon
to visualize ecological differences between the clusters.

```{r}
Expand Down
4 changes: 2 additions & 2 deletions inst/pages/community_similarity.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -786,7 +786,7 @@ the `plotRDA()` function from the `r BiocStyle::Biocpkg("miaViz")` package.
# Load packages for plotting function
library(miaViz)

# Generate RDA plot coloured by clinical status
# Generate RDA plot colored by clinical status
plotRDA(tse2, "RDA", colour.by = "ClinicalStatus")
```

Expand Down Expand Up @@ -1101,7 +1101,7 @@ eigenvalues?
6. Visualize the first two principal components.

7. Explore `colData` and visualize the first two principal components again,
now with samples coloured based on a variable from the sample metadata. Can you
now with samples colored based on a variable from the sample metadata. Can you
observe any patterns?

8. Visualize the PCA loadings for the two first components. Which features have
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/containers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ assay(tse, "counts") |> head()

In summary, in the world of microbiome analysis, an assay is
essentially a way to describe the composition of microbes in a given
sample. This way we can summarise the microbiome profile of a human gut
sample. This way we can summarize the microbiome profile of a human gut
or a sample of soil.

Furthermore, to illustrate the use of multiple assays, we can create an
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/contributions.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ This work has been supported by:
* [Research Council of Finland](https://www.aka.fi/)

* [FindingPheno](https://www.findingpheno.eu/) European Union’s Horizon 2020
research and innovation programme under grant agreement No 952914
research and innovation program under grant agreement No 952914

* COST Action network on Statistical and Machine Learning Techniques for Human
Microbiome Studies
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/correlation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ we will demonstrate how to perform correlation analysis with

## Association between taxonomic features

Here we demonstrate how to analyse which bacteria co-exists in the dataset.
Here we demonstrate how to analyze which bacteria co-exists in the dataset.

```{r}
#| label: association1
Expand Down
14 changes: 7 additions & 7 deletions inst/pages/extra_material/add-comm-typing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ res <- lapply(k, ClustDiagPlot)

### Composition barplot

A typical way to visualise microbiome composition is by using a composition barplot.
A typical way to visualize microbiome composition is by using a composition barplot.
In the following, we agglomerate to the phylum level and subset by the country "Finland" to avoid long computation times. The samples in the barplot are ordered by "Firmicutes":

```{r, message=FALSE, warning=FALSE}
Expand All @@ -155,7 +155,7 @@ plotAbundance(tse, rank = "Phylum", order.row.by = "abund", order.col.by = "Firm

### Composition heatmap

The community composition can be visualised with a heatmap where one axis represents the samples and the other taxa. The colour of each line represents the abundance of a taxon in a specific sample.
The community composition can be visualized with a heatmap where one axis represents the samples and the other taxa. The color of each line represents the abundance of a taxon in a specific sample.

Here, the CLR + Z-transformed abundances are shown.

Expand Down Expand Up @@ -196,7 +196,7 @@ grid.text("Phylum", x = -0.04, y = 0.47, rot = 90, gp = gpar(fontsize = 16))

## Cluster into CSTs

The burden of specifying the number of clusters falls on the researcher. To help make an informed decision, we turn to previously established methods for doing so. In this section we introduce three such methods (aside from DMM analysis) to cluster similar samples. They include the [Elbow Method, Silhouette Method, and Gap Statistic Method](https://uc-r.github.io/kmeans_clustering). All of them will utilise the [`kmeans'](https://uc-r.github.io/kmeans_clustering) algorithm which essentially assigns clusters and minimises the distance within clusters (a sum of squares calculation). The default distance metric used is the Euclidean metric.
The burden of specifying the number of clusters falls on the researcher. To help make an informed decision, we turn to previously established methods for doing so. In this section we introduce three such methods (aside from DMM analysis) to cluster similar samples. They include the [Elbow Method, Silhouette Method, and Gap Statistic Method](https://uc-r.github.io/kmeans_clustering). All of them will utilize the [`kmeans'](https://uc-r.github.io/kmeans_clustering) algorithm which essentially assigns clusters and minimizes the distance within clusters (a sum of squares calculation). The default distance metric used is the Euclidean metric.

The scree plot allows us to see how much of the variance is captured by each dimension in the MDS ordination.

Expand Down Expand Up @@ -260,7 +260,7 @@ The function says that the bend occurs at $k=3$, however it is hard to tell that

### Silhouette Method

This method on the otherhand returns a width for each $k$. In this case, we want the $k$ that maximises the width.
This method on the otherhand returns a width for each $k$. In this case, we want the $k$ that maximizes the width.

```{r silhouette}
# Silhouette method
Expand All @@ -272,7 +272,7 @@ The graph shows the maximum occurring at $k=6$. At the very least, there is stro

### Gap-Statistic Method

The Gap-Statistic Method is the most complicated among the methods discussed here. With the gap statistic method, we typically want the $k$ value that maximises the output (local and global maxima), but we also want to pay attention to where the plot jumps if the maximum value doesn't turn out to be helpful.
The Gap-Statistic Method is the most complicated among the methods discussed here. With the gap statistic method, we typically want the $k$ value that maximizes the output (local and global maxima), but we also want to pay attention to where the plot jumps if the maximum value doesn't turn out to be helpful.

```{r gap-statistic}
# Gap Statistic Method
Expand All @@ -282,7 +282,7 @@ factoextra::fviz_nbclust(x, kmeans, method = "gap_stat", nboot = 50)+

The peak suggests $k=6$ clusters. If we also look to the points where the graph jumps, we can see there is evidence for $k=2$, $k=6$, and $k=8$. The output indicates that there should be at least three clusters present. Since we have previous evidence for the existence of six clusters from the silhouette and elbow methods, we will go with $k=6$.

At this point it helps to visualise the clustering in an MDS or NMDS plot.
At this point it helps to visualize the clustering in an MDS or NMDS plot.

Now, let's divide the subjects into their respective clusters.

Expand All @@ -307,7 +307,7 @@ library(scater)
library(RColorBrewer)
library(patchwork)

# set up colours
# set up colors
CSTColors <- brewer.pal(6, "Paired")[c(2, 5, 3, 4, 1, 6)]
names(CSTColors) <- CSTs

Expand Down
4 changes: 2 additions & 2 deletions inst/pages/extra_material/extra_material.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ Here we'll show an example of how to add relative abundances and CLR normalized
OTU tables to your tse assays.

With phyloseq you would need three different phyloseq objects, each taking up
7.7 MB of memory, whilst the tse with the three assays takes up only 18.3 MB.
7.7 MB of memory, while the tse with the three assays takes up only 18.3 MB.

```{r}
#| label: transform_assay
Expand Down Expand Up @@ -407,7 +407,7 @@ under `altExp`.
`tax_glom()` removes the taxa which have not been assigned to the level given in
taxrank by default (NArm = TRUE).
So we will add the na.rm = TRUE to `agglomerateByRank()` function which is
equivalent to the default behaviour of `tax_glom()`.
equivalent to the default behavior of `tax_glom()`.

```{r}
#| label: agglomerateByRank
Expand Down
4 changes: 2 additions & 2 deletions inst/pages/extra_material/visualization.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ which is explained in chapter [@sec-extras].
# perform NMDS coordination method
tse <- runNMDS(tse, FUN = vegan::vegdist, name = "NMDS")
# plot results of a 2-component NMDS on tse,
# coloured-scaled by shannon diversity index
# colored-scaled by shannon diversity index
plotReducedDim(tse, "NMDS", colour_by = "shannon")
```

Expand All @@ -241,7 +241,7 @@ tse <- addMDS(
ncomponents = 3
)
# plot results of a 3-component MDS on tse,
# coloured-scaled by faith diversity index
# colored-scaled by faith diversity index
plotReducedDim(tse, "MDS", ncomponents = c(1:3), colour_by = "faith")
```

Expand Down
11 changes: 6 additions & 5 deletions inst/pages/machine_learning.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -125,15 +125,16 @@ table(tse[["disease"]]) |>

Before applying any ML algorithm, the data must be preprocessed.
This speeds up the training of the models by reducing the amount of
features analysed, a desirable outcome when working with
features analyzed, a desirable outcome when working with
high-dimensional microbiome data. In addition to faster performance,
common pre-processing steps have biological justifications.
For instance:

* **Collapse highly correlated features:** In a microbial community,
it's common for the abundance of two or more taxonomic features to be highly
correlated due to ecological interactions. Thus, removing or collapsing
correlated features allows the model to analyse them as one group.
correlated features allows the model to analyze them as one group.

* **Remove features with near-zero variance:** Features that don't vary
enough across groups can hardly help in discerning between them, as they
don't hold any biologically relevant information. Additionally,
Expand Down Expand Up @@ -679,14 +680,14 @@ roc_p + prc_p + plot_layout(guides = "collect")

Before describing the plots and their meaning, it is worth noting
that the ROC curves of both models resembles the curve presented in
the article where this dataset was first analysed [@qin2012_t2d]
the article where this dataset was first analyzed [@qin2012_t2d]
(see Figure 4B). Interestingly, authors used other supervised ML
algorithm, and it was trained in a set of 50 microbiome genes (instead
of taxonomic features and alpha diversity metrics, as we did). However, it is
interesting that concordant AUCs and ROC curves shapes were obtained
using different microbiome-derived information.

Regarding our figures, note the dashed grey lines in both plots
Regarding our figures, note the dashed gray lines in both plots
representing the expected performance of a model that is classifying
samples randomly. Therefore, the greater the distance between that
reference and the line representing our model's performance, the better.
Expand Down Expand Up @@ -774,7 +775,7 @@ obs_vs_pred <- obs_vs_pred + labs(x = "Predicted BMI", y = "Observed BMI")
obs_vs_pred + theme_bw()
```

The dashed grey line in the plot above represents a perfect correlation
The dashed gray line in the plot above represents a perfect correlation
between the observed and the model-predicted BMI values of each
participant. Thus, the line indicates perfect performance of the model.
We can see that while the predictions are around the mean BMI (close
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/mediation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ $$
The microbiome can mediate the effects of multiple environmental stimuli on
human health. However, the importance of its role as a mediator depends on the
nature of the stimulus. For example, the effect of dietary fiber intake on host
behaviour is largely mediated by the gut microbiome [@Logan2014nutritional]. In
behavior is largely mediated by the gut microbiome [@Logan2014nutritional]. In
contrast, the indirect impact of antibiotic use on mental health through an
altered microbiome represents a more subtle process [@Dinan2022antibiotics].

Expand Down
2 changes: 1 addition & 1 deletion inst/pages/miaverse.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ analysis
- `r BiocStyle::Githubpkg("himelmallick/IntegratedLearner")` for multiomics
classification and prediction
- `r BiocStyle::Biocpkg("iSEEtree")` [@Benedetti2025iseetree] for interactive
visualisation of hierarchical data
visualization of hierarchical data
- `r BiocStyle::Biocpkg("lefser")` [@Asya2024] for metagenomic
biomarker discovery
- `r BiocStyle::Biocpkg("LimROTS")` for differential expression analysis for
Expand Down
4 changes: 2 additions & 2 deletions inst/pages/phyloseq_cheatsheet.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ OTU tables to your `tse` assays.

With `r BiocStyle::Biocpkg("phyloseq")` you would need three different
`r BiocStyle::Biocpkg("phyloseq")` objects, each taking up 7.7 MB of memory,
whilst the tse with the three assays takes up only 18.3 MB.
while the tse with the three assays takes up only 18.3 MB.

```{r}
#| label: transform_assay
Expand Down Expand Up @@ -418,7 +418,7 @@ object under `altExp`.

`tax_glom()` removes the taxa which have not been assigned to the level given
in taxrank by default (NArm = TRUE). So we will add the na.rm = TRUE to
`agglomerateByRank()` function which is equivalent to the default behaviour
`agglomerateByRank()` function which is equivalent to the default behavior
of `tax_glom()`.

```{r}
Expand Down
2 changes: 1 addition & 1 deletion inst/pages/subsetting.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,7 @@ we opted for a rather conservative threshold that retains most features.

We can subset the data based on prevalence using `subsetByPrevalent()`,
which filters features that exceed a specified prevalence threshold,
helping to remove rare features that may be artefacts. Conversely,
helping to remove rare features that may be artifacts. Conversely,
`subsetByRare()` allows us to retain only features below the threshold,
enabling a focus on rare features within the dataset.

Expand Down
2 changes: 1 addition & 1 deletion inst/pages/support.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## FindingPheno

This project received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952914 ([FindingPheno](https://findingpheno.eu/)).
This project received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 952914 ([FindingPheno](https://findingpheno.eu/)).

## Online support

Expand Down
4 changes: 2 additions & 2 deletions inst/pages/transformation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ interpretable values, to enhance the comparability of samples/features or
to make data compatible with the assumptions of certain statistical methods.

Examples include transforming feature counts into relative abundances
(i.e., "normalising as proportions"), or with compositionality-aware
(i.e., "normalizing as proportions"), or with compositionality-aware
transformations such as the centered log-ratio transformation (clr).

## Characteristics of microbiome data to inform data transformations {#sec-stat-challenges}
Expand Down Expand Up @@ -99,7 +99,7 @@ ranks. This has use, for instance, in non-parametric statistics.
allows data with zeroes and avoids the need to add pseudocount
[@Keshavan2010; @Martino2019].

- **relabundance**: Relative transformation, also known as normalising as
- **relabundance**: Relative transformation, also known as normalizing as
proportions, total sum scaling (TSS) and compositional transformation.
This converts counts into proportions (at the scale [0, 1]) that sum up to 1.
Much of the currently available taxonomic abundance data from
Expand Down
Loading