microbiome · TuomasBorman · Feb 11, 2026 · Oct 1, 2025 · Jan 15, 2026 · Jan 15, 2026
diff --git a/inst/pages/alpha_diversity.qmd b/inst/pages/alpha_diversity.qmd
@@ -34,8 +34,8 @@ evident from their names. @bastiaanssen2023bugs1 lay out this relationship
 across two factors (See table below); First, alpha diversity metrics can be
 defined as special cases of a unifying equation of **diversity**, where the
 **Hill number** determines the specific index captured. Lower Hill numbers
-favour **richness**, the number of distinct taxonomic features, whereas higher
-numbers favour **evenness**, how the taxonomic features are distributed over
+favor **richness**, the number of distinct taxonomic features, whereas higher
+numbers favor **evenness**, how the taxonomic features are distributed over
 the sample [@Hill1973]. Second, some alpha diversity metrics are weighted based
 on phylogeny, like Faith's PD [-@Faith1992] and PhILR [@Silverman2017].
 
@@ -192,7 +192,7 @@ barcode).
 ```{r}
 #| label: plot_richness
 #| message: false
-#| fig-cap: "Observed richness plotted grouped by sample type with colour-labelled barcode."
+#| fig-cap: "Observed richness plotted grouped by sample type with color-labeled barcode."
 
 library(scater)
 plotColData(

diff --git a/inst/pages/clustering.qmd b/inst/pages/clustering.qmd
@@ -118,7 +118,7 @@ clusters to describe the data.
 Now, we visualize the hierarchical structure of the clusters with a dendrogram
 tree. In dendrograms, the tree is split where the branch length is the largest.
 In each splitting point, the tree is divided into two clusters leading to the
-hierarchy. In this example, each sample is labelled by their dominant taxon
+hierarchy. In this example, each sample is labeled by their dominant taxon
 to visualize ecological differences between the clusters.
 
 ```{r}

diff --git a/inst/pages/community_similarity.qmd b/inst/pages/community_similarity.qmd
@@ -786,7 +786,7 @@ the `plotRDA()` function from the `r BiocStyle::Biocpkg("miaViz")` package.
 # Load packages for plotting function
 library(miaViz)
 
-# Generate RDA plot coloured by clinical status
+# Generate RDA plot colored by clinical status
 plotRDA(tse2, "RDA", colour.by = "ClinicalStatus")
 ```
 
@@ -1101,7 +1101,7 @@ eigenvalues?
 6. Visualize the first two principal components.
 
 7. Explore `colData` and visualize the first two principal components again,
-now with samples coloured based on a variable from the sample metadata. Can you
+now with samples colored based on a variable from the sample metadata. Can you
 observe any patterns?
 
 8. Visualize the PCA loadings for the two first components. Which features have

diff --git a/inst/pages/containers.qmd b/inst/pages/containers.qmd
@@ -188,7 +188,7 @@ assay(tse, "counts") |> head()
 
 In summary, in the world of microbiome analysis, an assay is
 essentially a way to describe the composition of microbes in a given
-sample. This way we can summarise the microbiome profile of a human gut
+sample. This way we can summarize the microbiome profile of a human gut
 or a sample of soil.
 
 Furthermore, to illustrate the use of multiple assays, we can create an

diff --git a/inst/pages/contributions.qmd b/inst/pages/contributions.qmd
@@ -249,7 +249,7 @@ This work has been supported by:
 * [Research Council of Finland](https://www.aka.fi/)
 
 * [FindingPheno](https://www.findingpheno.eu/) European Union’s Horizon 2020
-research and innovation programme under grant agreement No 952914
+research and innovation program under grant agreement No 952914
 
 * COST Action network on Statistical and Machine Learning Techniques for Human
 Microbiome Studies

diff --git a/inst/pages/correlation.qmd b/inst/pages/correlation.qmd
@@ -19,7 +19,7 @@ we will demonstrate how to perform correlation analysis with
 
 ## Association between taxonomic features
 
-Here we demonstrate how to analyse which bacteria co-exists in the dataset.
+Here we demonstrate how to analyze which bacteria co-exists in the dataset.
 
 ```{r}
 #| label: association1

diff --git a/inst/pages/extra_material/add-comm-typing.Rmd b/inst/pages/extra_material/add-comm-typing.Rmd
@@ -134,7 +134,7 @@ res <- lapply(k, ClustDiagPlot)
 
 ### Composition barplot
 
-A typical way to visualise microbiome composition is by using a composition barplot.
+A typical way to visualize microbiome composition is by using a composition barplot.
 In the following, we agglomerate to the phylum level and subset by the country "Finland" to avoid long computation times. The samples in the barplot are ordered by "Firmicutes":
 
 ```{r, message=FALSE, warning=FALSE}
@@ -155,7 +155,7 @@ plotAbundance(tse, rank = "Phylum", order.row.by = "abund", order.col.by = "Firm
 
 ### Composition heatmap
 
-The community composition can be visualised with a heatmap where one axis represents the samples and the other taxa. The colour of each line represents the abundance of a taxon in a specific sample.
+The community composition can be visualized with a heatmap where one axis represents the samples and the other taxa. The color of each line represents the abundance of a taxon in a specific sample.
 
 Here, the CLR + Z-transformed abundances are shown.
 
@@ -196,7 +196,7 @@ grid.text("Phylum", x = -0.04, y = 0.47, rot = 90, gp = gpar(fontsize = 16))
 
 ## Cluster into CSTs
 
-The burden of specifying the number of clusters falls on the researcher. To help make an informed decision, we turn to previously established methods for doing so. In this section we introduce three such methods (aside from DMM analysis) to cluster similar samples. They include the [Elbow Method, Silhouette Method, and Gap Statistic Method](https://uc-r.github.io/kmeans_clustering). All of them will utilise the [`kmeans'](https://uc-r.github.io/kmeans_clustering) algorithm which essentially assigns clusters and minimises the distance within clusters (a sum of squares calculation). The default distance metric used is the Euclidean metric.
+The burden of specifying the number of clusters falls on the researcher. To help make an informed decision, we turn to previously established methods for doing so. In this section we introduce three such methods (aside from DMM analysis) to cluster similar samples. They include the [Elbow Method, Silhouette Method, and Gap Statistic Method](https://uc-r.github.io/kmeans_clustering). All of them will utilize the [`kmeans'](https://uc-r.github.io/kmeans_clustering) algorithm which essentially assigns clusters and minimizes the distance within clusters (a sum of squares calculation). The default distance metric used is the Euclidean metric.
 
 The scree plot allows us to see how much of the variance is captured by each dimension in the MDS ordination.
 
@@ -260,7 +260,7 @@ The function says that the bend occurs at $k=3$, however it is hard to tell that
 
 ### Silhouette Method
 
-This method on the otherhand returns a width for each $k$. In this case, we want the $k$ that maximises the width.
+This method on the otherhand returns a width for each $k$. In this case, we want the $k$ that maximizes the width.
 
 ```{r silhouette}
 # Silhouette method
@@ -272,7 +272,7 @@ The graph shows the maximum occurring at $k=6$. At the very least, there is stro
 
 ### Gap-Statistic Method
 
-The Gap-Statistic Method is the most complicated among the methods discussed here. With the gap statistic method, we typically want the $k$ value that maximises the output (local and global maxima), but we also want to pay attention to where the plot jumps if the maximum value doesn't turn out to be helpful. 
+The Gap-Statistic Method is the most complicated among the methods discussed here. With the gap statistic method, we typically want the $k$ value that maximizes the output (local and global maxima), but we also want to pay attention to where the plot jumps if the maximum value doesn't turn out to be helpful. 
 
 ```{r gap-statistic}
 # Gap Statistic Method
@@ -282,7 +282,7 @@ factoextra::fviz_nbclust(x, kmeans, method = "gap_stat", nboot = 50)+
 
 The peak suggests $k=6$ clusters. If we also look to the points where the graph jumps, we can see there is evidence for $k=2$, $k=6$, and $k=8$. The output indicates that there should be at least three clusters present. Since we have previous evidence for the existence of six clusters from the silhouette and elbow methods, we will go with $k=6$. 
 
-At this point it helps to visualise the clustering in an MDS or NMDS plot. 
+At this point it helps to visualize the clustering in an MDS or NMDS plot. 
 
 Now, let's divide the subjects into their respective clusters.
 
@@ -307,7 +307,7 @@ library(scater)
 library(RColorBrewer)
 library(patchwork)
 
-# set up colours
+# set up colors
 CSTColors <- brewer.pal(6, "Paired")[c(2, 5, 3, 4, 1, 6)]
 names(CSTColors) <- CSTs
 

diff --git a/inst/pages/extra_material/extra_material.qmd b/inst/pages/extra_material/extra_material.qmd
@@ -189,7 +189,7 @@ Here we'll show an example of how to add relative abundances and CLR normalized
 OTU tables to your tse assays.
 
 With phyloseq you would need three different phyloseq objects, each taking up
-7.7 MB of memory, whilst the tse with the three assays takes up only 18.3 MB.
+7.7 MB of memory, while the tse with the three assays takes up only 18.3 MB.
 
 ```{r}
 #| label: transform_assay
@@ -407,7 +407,7 @@ under `altExp`.
 `tax_glom()` removes the taxa which have not been assigned to the level given in
 taxrank by default (NArm = TRUE).
 So we will add the na.rm = TRUE to `agglomerateByRank()` function which is
-equivalent to the default behaviour of `tax_glom()`.
+equivalent to the default behavior of `tax_glom()`.
 
 ```{r}
 #| label: agglomerateByRank

diff --git a/inst/pages/extra_material/visualization.qmd b/inst/pages/extra_material/visualization.qmd
@@ -222,7 +222,7 @@ which is explained in chapter [@sec-extras].
 # perform NMDS coordination method
 tse <- runNMDS(tse, FUN = vegan::vegdist, name = "NMDS")
 # plot results of a 2-component NMDS on tse,
-# coloured-scaled by shannon diversity index
+# colored-scaled by shannon diversity index
 plotReducedDim(tse, "NMDS", colour_by = "shannon")
 ```
 
@@ -241,7 +241,7 @@ tse <- addMDS(
     ncomponents = 3
 )
 # plot results of a 3-component MDS on tse,
-# coloured-scaled by faith diversity index
+# colored-scaled by faith diversity index
 plotReducedDim(tse, "MDS", ncomponents = c(1:3), colour_by = "faith")
 ```
 

diff --git a/inst/pages/machine_learning.qmd b/inst/pages/machine_learning.qmd
@@ -125,15 +125,16 @@ table(tse[["disease"]]) |>
 
 Before applying any ML algorithm, the data must be preprocessed. 
 This speeds up the training of the models by reducing the amount of 
-features analysed, a desirable outcome when working with 
+features analyzed, a desirable outcome when working with 
 high-dimensional microbiome data. In addition to faster performance, 
 common pre-processing steps have biological justifications. 
 For instance:
 
 * **Collapse highly correlated features:** In a microbial community,
 it's common for the abundance of two or more taxonomic features to be highly
 correlated due to ecological interactions. Thus, removing or collapsing
-correlated features allows the model to analyse them as one group.
+correlated features allows the model to analyze them as one group.
+
 * **Remove features with near-zero variance:** Features that don't vary
 enough across groups can hardly help in discerning between them, as they
 don't hold any biologically relevant information. Additionally, 
@@ -679,14 +680,14 @@ roc_p + prc_p + plot_layout(guides = "collect")
 
 Before describing the plots and their meaning, it is worth noting
 that the ROC curves of both models resembles the curve presented in 
-the article where this dataset was first analysed [@qin2012_t2d] 
+the article where this dataset was first analyzed [@qin2012_t2d] 
 (see Figure 4B). Interestingly, authors used other supervised ML 
 algorithm, and it was trained in a set of 50 microbiome genes (instead
 of taxonomic features and alpha diversity metrics, as we did). However, it is 
 interesting that concordant AUCs and ROC curves shapes were obtained 
 using different microbiome-derived information. 
 
-Regarding our figures, note the dashed grey lines in both plots
+Regarding our figures, note the dashed gray lines in both plots
 representing the expected performance of a model that is classifying
 samples randomly. Therefore, the greater the distance between that 
 reference and the line representing our model's performance, the better.
@@ -774,7 +775,7 @@ obs_vs_pred <- obs_vs_pred + labs(x = "Predicted BMI", y = "Observed BMI")
 obs_vs_pred + theme_bw()
 ```
 
-The dashed grey line in the plot above represents a perfect correlation
+The dashed gray line in the plot above represents a perfect correlation
 between the observed and the model-predicted BMI values of each 
 participant. Thus, the line indicates perfect performance of the model. 
 We can see that while the predictions are around the mean BMI (close

diff --git a/inst/pages/mediation.qmd b/inst/pages/mediation.qmd
@@ -29,7 +29,7 @@ $$
 The microbiome can mediate the effects of multiple environmental stimuli on
 human health. However, the importance of its role as a mediator depends on the
 nature of the stimulus. For example, the effect of dietary fiber intake on host
-behaviour is largely mediated by the gut microbiome [@Logan2014nutritional]. In
+behavior is largely mediated by the gut microbiome [@Logan2014nutritional]. In
 contrast, the indirect impact of antibiotic use on mental health through an
 altered microbiome represents a more subtle process [@Dinan2022antibiotics].
 

diff --git a/inst/pages/miaverse.qmd b/inst/pages/miaverse.qmd
@@ -143,7 +143,7 @@ analysis
 - `r BiocStyle::Githubpkg("himelmallick/IntegratedLearner")` for multiomics
 classification and prediction
 - `r BiocStyle::Biocpkg("iSEEtree")` [@Benedetti2025iseetree] for interactive
-visualisation of hierarchical data
+visualization of hierarchical data
 - `r BiocStyle::Biocpkg("lefser")` [@Asya2024] for metagenomic
 biomarker discovery
 - `r BiocStyle::Biocpkg("LimROTS")` for differential expression analysis for

diff --git a/inst/pages/phyloseq_cheatsheet.qmd b/inst/pages/phyloseq_cheatsheet.qmd
@@ -194,7 +194,7 @@ OTU tables to your `tse` assays.
 
 With `r BiocStyle::Biocpkg("phyloseq")` you would need three different
 `r BiocStyle::Biocpkg("phyloseq")` objects, each taking up 7.7 MB of memory,
-whilst the tse with the three assays takes up only 18.3 MB.
+while the tse with the three assays takes up only 18.3 MB.
 
 ```{r}
 #| label: transform_assay
@@ -418,7 +418,7 @@ object under `altExp`.
 
 `tax_glom()` removes the taxa which have not been assigned to the level given
 in taxrank by default (NArm = TRUE). So we will add the na.rm = TRUE to
-`agglomerateByRank()` function which is equivalent to the default behaviour
+`agglomerateByRank()` function which is equivalent to the default behavior
 of `tax_glom()`.
 
 ```{r}

diff --git a/inst/pages/subsetting.qmd b/inst/pages/subsetting.qmd
@@ -285,7 +285,7 @@ we opted for a rather conservative threshold that retains most features.
 
 We can subset the data based on prevalence using `subsetByPrevalent()`,
 which filters features that exceed a specified prevalence threshold,
-helping to remove rare features that may be artefacts. Conversely,
+helping to remove rare features that may be artifacts. Conversely,
 `subsetByRare()` allows us to retain only features below the threshold,
 enabling a focus on rare features within the dataset.
 

diff --git a/inst/pages/support.qmd b/inst/pages/support.qmd
@@ -2,7 +2,7 @@
 
 ## FindingPheno
 
-This project received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952914 ([FindingPheno](https://findingpheno.eu/)).
+This project received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 952914 ([FindingPheno](https://findingpheno.eu/)).
 
 ## Online support
 

diff --git a/inst/pages/transformation.qmd b/inst/pages/transformation.qmd
@@ -15,7 +15,7 @@ interpretable values, to enhance the comparability of samples/features or
 to make data compatible with the assumptions of certain statistical methods.
 
 Examples include transforming feature counts into relative abundances
-(i.e., "normalising as proportions"), or with compositionality-aware
+(i.e., "normalizing as proportions"), or with compositionality-aware
 transformations such as the centered log-ratio transformation (clr).
 
 ## Characteristics of microbiome data to inform data transformations {#sec-stat-challenges}
@@ -99,7 +99,7 @@ ranks. This has use, for instance, in non-parametric statistics.
 allows data with zeroes and avoids the need to add pseudocount
 [@Keshavan2010; @Martino2019].
 
-- **relabundance**: Relative transformation, also known as normalising as
+- **relabundance**: Relative transformation, also known as normalizing as
 proportions, total sum scaling (TSS) and compositional transformation.
 This converts counts into proportions (at the scale [0, 1]) that sum up to 1.
 Much of the currently available taxonomic abundance data from