-
Notifications
You must be signed in to change notification settings - Fork 3
Description
As we widen Data Connect interactions with other GA4GH work streams it may be worth an experimental branch in which to implement DataModelSuppliers that provide the schema or model in different forms than the current json schema. The purpose of this would be experimental - to match the representation required for user need with what different schema types provide. How models are represented in GA4GH is currently an open question being considered by TASC. That the Data Connect implementation could provide a workbench to test out different possibilities may be helpful to that effort.
Some formats worth looking at might include, but are most definitely not limited to
- Simple extended dbGAP dictionary format
- ISO11179 - at least a couple of metadata repositories of relevance use this standard.
- R approach to documenting data structures
- Link-ML
- SchemaBlocks
- Protobuf
- XML Metadata Interchange (XMI)
- RDA Data Type Registries
Some of the representation in specific formats could be handled on the client end. For example, an R client could deal with translation of the Data Connect/GA4GH schema format for the format used to define the data structures in R. This likely the best solution architecturally. The base question though is what needs to be provided by Data Connect in order to meet user need.
A high level summary of the specific user needs referred to are:
- Understand the data:
- from an unfamiliar domain
- from standard, but niche, specialities e.g. AJCC cancer stage for glioblastoma multiforme
- the data structure of a particular, perhaps unique, experimental design
- For the data described by the schema/model; be provided with sufficient information to:
- Transform the data as needed for the user's purpose
- Aggregate the data with data from other sources
It is clear that at least the following are core to the needs:
- References to semantic descriptions (standard or not)
- Use of scientific units
These would be relevant to data scientists who would be direct users of Data Connect or who would use tools that make use of Data Connect services.