Skip to content

Basic workflow

Gavin Douglas edited this page Sep 8, 2022 · 10 revisions

The key POMS function is POMS_pipeline, which requires tables of taxa and function abundances, a tree of taxa, and sample names split into group1 and group2.

This function identifies significant nodes based on sample balances using a Wilcoxon test by default, or significant nodes can be specified manually. Significant nodes are referred to as Balance-Significant Nodes (BSNs). Fisher's exact tests are run at each node in the tree with sufficient numbers of underlying tips on each side. Significant nodes based on this test are referred to as Function-Significant Nodes (FSNs). The key output is the tally of the intersecting nodes based on the sets of BSNs and FSNs.

Each FSN can be categorized in one of three ways:

  1. It does not intersect with any BSN.
  2. It intersects with a BSN and the functional enrichment is within the taxa that are relatively more abundant in group 1 samples.
  3. Same as #2, but enriched within taxa that are relatively more abundant in group 2 samples.

A multinomial test is run to see if the tallies the FSNs in these three categories is significantly different from the random expectation.

Example usage

The below code will run the pipeline on example input files that are part of this repository. These example files are highly simplified to run quickly - for instance, there are only two functions tested and only 60 tips in the tree.

Note: the options min_num_tips, multinomial_min_FSNs, and min_func_instances are set to non-default values to work with this small example, but normally you could leave them to be default.

setwd("path/to/POMS/example_files/")

library(POMS)

ex_taxa_abun <- read.table("ex_taxa_abun.tsv.gz", header = TRUE, sep = "\t", row.names = 1)
ex_func <- read.table("ex_func.tsv.gz", header = TRUE, sep = "\t", row.names = 1)
ex_tree <- ape::read.tree("ex_tree.newick")
ex_group1 <- read.table("ex_group1.txt.gz", stringsAsFactors = FALSE)$V1
ex_group2 <- read.table("ex_group2.txt.gz", stringsAsFactors = FALSE)$V1

# Example of how to run main POMS function. 
POMS_out <- POMS_pipeline(abun = ex_taxa_abun,
                          func = ex_func,
                          tree = ex_tree,
                          group1_samples = ex_group1,
                          group2_samples = ex_group2,
                          ncores = 1,
                          min_num_tips = 4,
                          multinomial_min_FSNs = 3,
                          min_func_instances = 0,
                          verbose = TRUE)

Example output

The above usage example will produce a list called POMS_out. This can be a very large list full of intermediate objects when detailed=TRUE. However, most users will just be interested in the output results dataframe. Usually there will be many rows to this dataframe, but in this case there are only two because that's how many functions were tested. Accordingly, we can look at the transposed version of this dataframe to see it better.

t(POMS_out$results)
#                        K07106 K02036
# num_FSNs                    4 5.0000
# num_FSNs_group1_enrich      2 5.0000
# num_FSNs_group2_enrich      1 0.0000
# num_FSNs_at_nonBSNs         1 0.0000
# multinomial_p               1 0.0272
# multinomial_corr            1 0.0544
  • num_FSNs: Number of function-significant nodes (FSNs), i.e., nodes with differential enrichment of the function in one subtree (the next three categories sum to this count)
  • num_FSNs_group1_enrich: Number of FSNs where the function is enriched in the subtree that is relatively higher in group1 samples.
  • num_FSNs_group2_enrich: Number of FSNs where the function is enriched in the subtree that is relatively higher in group2 samples.
  • num_FSNs_at_nonBSNs: Number of FSNs where the function is enriched, but they do intersect with a BSN (i.e., there is no difference in sample balances).
  • multinomial_p and multinomial_corr P and corrected P-value (BH by default) based on multinomial test.

The other default output elements in POMS_out are:

  • balance_info: list containing the tips underlying each node, which were what the balances are based on, the balances themselves at each tested node, and the set of nodes that were determined to be negligible due to having too few underlying tips. Note that the balances and underlying tips are provided for all non-negligible (i.e., tested) nodes, not just those identified as BSNs.

  • BSNs: character vector with BSNs as names and values of "group1" and "group2" to indicate for which sample group (or other binary division) the sample balances were higher.

  • FSNs_summary: list containing each tested function as a separate element. The labels for nodes in each FSN category of the multinomial test are listed per function (or are empty if there were no such FSNs).

  • tree: the prepped tree used by the pipeline, including the added node labels if a tree lacking labels was provided. This tree will also have been subset to only those tips found in the abundance table, and midpoint rooted (if it was not already rooted).

  • multinomial_exp_prop: expected proportions of the three FSN categories used for multinomial test.

Clone this wiki locally