Skip to content

Conversation

@luhann
Copy link
Contributor

@luhann luhann commented Oct 18, 2025

Hi,

Thanks for all the work you do on corncob, if this PR is out of the scope of things you would like just let me know and I'll close it.

This PR adds full SummarizedExperiment support, and by extension full TreeSummarizedExperiment support. I've checked all the output and plots etc and they are identical between the phyloseq and SummarizedExperiment versions.

I have also duplicated all of the phyloseq tests for the SummarizedExperiment case. corncob should now natively support using SummarizedExperiments.

This is a draft PR because there is still a bit of documentation I would like to add to clarify using SummarizedExperiments, but it should work as is for testing purposes.

@luhann
Copy link
Contributor Author

luhann commented Oct 18, 2025

I'm just keeping this here as a list of things I need to finish before I consider this PR "ready"

TODO

  • Vignettes
    • I don't think a whole new vignette is necessary, I think just mentioning TreeSummarizedExperiments in the phyloseq vignette should be sufficient.
  • Documentation
    • The main end-user facing functions bbdml and differentialTest need documentation that says they also accept SummarizedExperiments.
  • Helper Functions
    • bbdml and differentialTest accept TSE but I need to double-check that all possible helper functions that accept phyloseq can either also accept TSE or there are TSE alternatives to those functions.
  • Robust SummarizedExperiment checking
    • Technically SummarizedExperiments can be much broader than just metagenomic sequencing, it would be useful if I add some specific checks so that functions behave as users expect (i.e check for taxonomic information, which is not guaranteed to exist in SummarizedExperiment)
    • Thinking about this more, I don't think the above is actually necessary, there is only one place I use taxonomic information, and I think corncob should just work even if there is no info

I welcome any comments or possible changes, but I think once I have addressed the above the phyloseq and SummarizedExperiment versions will be equivalent.

@svteichman
Copy link
Collaborator

@luhann thank you for putting this together, I think it is a great additional functionality to add to corncob and definitely in the scope of things that are worth adding! Let me know when it's ready to review and I'll take a look.

Sarah

@luhann
Copy link
Contributor Author

luhann commented Oct 24, 2025

Hi @svteichman, thanks so much. The code is pretty much ready to review. I haven't changed any of the vignettes yet, because I'm still not sure where the best place for the information should be, happy to hear any suggestions.

As I said I've included a test file which duplicates all of the phyloseq tests for the SummarizedExperiment objects, so all the code should work.

Here is also a small script that I was using to test the phyloseq vs SummarizedExperiment versions.

library(corncob)
library(phyloseq)
library(magrittr)
library(SummarizedExperiment)

data(soil_phylo_sample)
data(soil_phylo_otu)
data(soil_phylo_taxa)
soil_phylo <- phyloseq::phyloseq(phyloseq::sample_data(soil_phylo_sample),
                                phyloseq::otu_table(soil_phylo_otu, taxa_are_rows = TRUE),
                                phyloseq::tax_table(soil_phylo_taxa))


soil <- soil_phylo %>% 
            phyloseq::subset_samples(DayAmdmt %in% c(11,21)) %>%
            phyloseq::tax_glom("Phylum")

soil_tse = mia::convertFromPhyloseq(soil)


corncob <- bbdml(formula = OTU.1 ~ 1,
             phi.formula = ~ 1,
             data = soil)

corncob_tse <- bbdml(formula = OTU.1 ~ 1,
             phi.formula = ~ 1,
            data = soil_tse)

plot(corncob, B = 50)
plot(corncob_tse, B = 50)


corncob_da <- bbdml(formula = OTU.1 ~ DayAmdmt,
             phi.formula = ~ DayAmdmt,
             data = soil)
corncob_da_tse <- bbdml(formula = OTU.1 ~ DayAmdmt,
             phi.formula = ~ DayAmdmt,
             data = soil_tse)

corncob_da
corncob_da_tse

plot(corncob_da, color = "DayAmdmt", B = 50)
plot(corncob_da_tse, color = "DayAmdmt", B = 50)


plot(corncob_da, color = "DayAmdmt", B = 50, data_only = TRUE)
plot(corncob_da_tse, color = "DayAmdmt", B = 50)


set.seed(1)
da_analysis <- differentialTest(formula = ~ DayAmdmt,
                                 phi.formula = ~ DayAmdmt,
                                 formula_null = ~ 1,
                                 phi.formula_null = ~ DayAmdmt,
                                 test = "Wald", boot = FALSE,
                                 data = soil,
                                 fdr_cutoff = 0.05)


set.seed(1)
da_analysis_tse <- differentialTest(formula = ~ DayAmdmt,
                                 phi.formula = ~ DayAmdmt,
                                 formula_null = ~ 1,
                                 phi.formula_null = ~ DayAmdmt,
                                 test = "Wald", boot = FALSE,
                                 data = soil_tse,
                                 fdr_cutoff = 0.05)

da_analysis$significant_taxa
da_analysis_tse$significant_taxa

plot(da_analysis)
plot(da_analysis_tse)


plot(da_analysis, level = "Kingdom")
plot(da_analysis_tse, level = "Kingdom")

@svteichman
Copy link
Collaborator

Great, I'll take a look at this early next week. Looking at the vignettes, I think this could be added as a brief subsection at the end of the "intro_corncob" vignette, something like "if your data is in a TreeSummarizedExperiment object instead of a phyloseq object, corncob will still work and here's how..."

@svteichman
Copy link
Collaborator

svteichman commented Oct 27, 2025

I just triggered automatic checks to run - the failing ones are happening because in the workflows you aren't explicitly installing SummarizedExperiment (which needs to happen for packages that are from Bioconductor and not CRAN). You can fix this by going to the ".github/workflows/R-CMD-check.yaml" and "test-coverage.yaml" and update the following:

  • name: Install dependencies
    run: |
    remotes::install_deps(dependencies = TRUE)
    remotes::install_cran("rcmdcheck")
    if (!requireNamespace("BiocManager", quietly = TRUE)){install.packages("BiocManager")}; BiocManager::install(c("phyloseq", "limma"), ask = FALSE)
    shell: Rscript {0}

where it installs phyloseq and limma is where you can add SummarizedExperiment, that way the new functionality can be tested automatically.

@svteichman
Copy link
Collaborator

Otherwise this looks great, and all of the tests you provide run for me when I pull this PR locally. And thank you for adding such extensive testing! Once you have an idea about where to add this into a vignette and update the workflows I think this is set to merge.

@luhann
Copy link
Contributor Author

luhann commented Oct 28, 2025

Thanks for the review, that's great to hear!

I'll add SummarizedExperiment to the GitHub workflow and update the vignette in the next day or two and let you know when I think it's ready for a final review.

@luhann luhann marked this pull request as ready for review November 13, 2025 10:57
@luhann
Copy link
Contributor Author

luhann commented Nov 13, 2025

Apologies for the delay, but everything should be good to go now. I've added the packages to github workflows and added a short paragraph to the vignette about working with SummarizedExperiment objects.

Copy link
Collaborator

@svteichman svteichman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! If all checks pass then I'll merge. Thanks for your work on this @luhann!

@svteichman svteichman merged commit d7ac4c8 into statdivlab:main Nov 13, 2025
4 checks passed
@adw96
Copy link
Collaborator

adw96 commented Nov 13, 2025

Thank you again, @luhann ! We really appreciate this functionality.

@luhann luhann deleted the tse branch November 17, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants