Skip to content

Policy for metadata when combining objects #79

@LTLA

Description

@LTLA

Consider:

library(S4Vectors)
X <- DataFrame(X=1)
metadata(X)$X <- "WHEE"

Y <- DataFrame(Y=1)
metadata(Y)$Y <- "FOO"

metadata(cbind(X, Y))
## $X
## [1] "WHEE"

That's fine, I guess. But then:

library(SummarizedExperiment)
xx <- SummarizedExperiment()
metadata(xx)$X <- "WHEE"

yy <- SummarizedExperiment()
metadata(yy)$Y <- "FOO"

metadata(cbind(xx, yy))
## $X
## [1] "WHEE"
## 
## $Y
## [1] "FOO"

Should there be a consistent policy here? IMO it would make most sense to c the metadata lists, removing duplicate names (plus a warning if their values are not identical). This has the nice properties of:

  • Preserving most information, provided that they have different names in the various objects. TBH, the lost information might not be too bad; list elements with the same name but different values aren't that helpful in downstream analyses anyway, especially if we no longer have the knowledge about which of the original objects they came from.
  • Ensuring that, e.g., cbind(df[,0], df) would give back df. This wouldn't be the case if you just continually appended the metadata lists together, which would arbitrarily extend the metadata list in the bind'd object.

One could even imagine writing a combineMetadata() function that all Annotated subclasses can call, so as to easily combine the metadata() fields in a standard way for c, rbind, cbind, combineRows, combineCols, etc. etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions