Skip to content

Consider/implement alternate schema/model representations #8

@ianfore

Description

@ianfore

As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.

Some formats worth looking at might include, but are most definitely not limited to

  • Simple extended dbGAP dictionary format
  • ISO11179 - at least a couple of metadata repositories of relevance use this standard.
  • R approach to documenting data structures
  • Link-ML
  • SchemaBlocks
  • Protobuf
  • XML Metadata Interchange (XMI)
  • RDA Data Type Registries

Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.

A high level summary of the specific user needs referred to are:

  • Understand the data:
    • from an unfamiliar domain
    • from standard, but niche, specialities e.g. AJCC cancer stage for glioblastoma multiforme
    • the data structure of a particular, perhaps unique, experimental design
  • For the data described by the schema/model; be provided with sufficient information to:
    • Transform the data as needed for the user's purpose
    • Aggregate the data with data from other sources

It is clear that at least the following are core to the needs:

  • References to semantic descriptions (standard or not)
  • Use of scientific units

These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions