Skip to content

Collect datasets #2

@makkus

Description

@makkus

Description

This is a catch-all issue to remind everyone that we want to collect 'relevant' (whatever that means -- there is really no perfect way to determine that) datasets. This would be datasets we imagine could be used as inputs for one or several of our workflows.

In addition to the datasets itself, we want to collect as much metadata about them as possible (origin/authors, description, schema, how they were produced, whether they were already pre-processed and how, etc...).

Also, it would be great to describe the imperfections of each dataset, what would ideally be needed to make them 'perfect' as inputs to workflows and how they would look like once that was done (meaning, there would be no further pre-processing steps needed, they could be reliably used as inputs without having to worry about data quality).

Task list

  • determine how/where to store those datasets
  • create a minimal schema of information we want to collect for each dataset
  • create a schema for optional metadata we would like to collect for each dataset, if easily possible
  • collect datasets (open ended)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions