Conversation
…ed to be clearer what the high level term was. Also remove a plain wrong hyphen from the example: "missing: data agreement-established pre-2023"
…for NCBI, DDBJ as well as ENA
pesant-ebi
left a comment
There was a problem hiding this comment.
I recommend some significant changes to the INSDC Missing Value Reporting Vocabulary:
(1) Flattening the table so that users can pick the exact reporting term without any ambiguity and without having to read the instructions to understand that they need to sometimes append the prefix "missing:"
(2) The "top level", "lower level" and "reporting level" nomenclature is not intuitive and could be replaced by three levels of granularity (low, medium, high). We would recommend using the highest possible level of granularity in order to better contextualise the data.
(3) The vocabulary should be used with mandatory, recommended or optional metadata fields, which simplifies its usage. In the case of batch submissions, any metadata field (mandatory, recommended or optional) may have values for some samples and missing values for other samples.
(4) The vocabulary should become an ontology so that a lookup service can be used to validate Missing Value Reporting terms on any field.
The proposed changes are shared here: https://docs.google.com/spreadsheets/d/13Az7f191RGfavYC1-8BCTOQy9uzZiY6Fy6Cy0R01Py8/edit?gid=2059784181#gid=2059784181
INSDC missing value vocabulary - INSDC missing value terms.pdf
This is an improvement to make the missing value table clearer as requested by
https://ncbijira.ncbi.nlm.nih.gov/browse/INSDC-2063
It is not perfect, but it increases clarity significantly over the original. Can then be re-visited and iterated on from there.