Skip to content

Tengeler2020 rowData has some families and genera with number suffix #761

@RiboRings

Description

@RiboRings

Hi!

The Tengeler2020 dataset contains some families and genera in the rowData with number suffixes. It looks like they were used to make features unique, but probably they should be removed from the taxonomic ranks. I checked the original biom file and the number suffixes are also there, so it does not seem to be an issue with importing.

Example:

library(mia)

data("Tengeler2020", package = "mia")
tse <- Tengeler2020

apply(rowData(tse), 2L, function(col) col[grepl("_\\d$", col)])

# $Kingdom
# named character(0)
#
# $Phylum
# named character(0)
#
# $Class
# named character(0)
#
# $Order
# named character(0)
#
# $Family
# Clostridium_sensu_stricto_1 
#          "Clostridiaceae_1" 
#
# $Genus
#                Ruminococcus_1                 Coprococcus_2                Ruminococcus_2 
#              "Ruminococcus_1"               "Coprococcus_2"              "Ruminococcus_2" 
#           Ruminiclostridium_5   Clostridium_sensu_stricto_1           Ruminiclostridium_9 
#         "Ruminiclostridium_5" "Clostridium_sensu_stricto_1"         "Ruminiclostridium_9" 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions