Skip to content

Conversation

@DeondeJager
Copy link
Contributor

  • Added section: How to fill in missing data.
  • Populated the new section with information.
  • Fixed a small typo.

- Added section: How to fill in missing data.
- Populated the new section with information.
- Fixed a small typo.

## How to fill in missing data

Follow the [INSDC Missing Value Reporting](https://www.insdc.org/technical-specifications/missing-value-reporting/) specifications.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Follow the [INSDC Missing Value Reporting](https://www.insdc.org/technical-specifications/missing-value-reporting/) specifications.
Follow the [INSDC Missing Value Reporting](https://www.insdc.org/technical-specifications/missing-value-reporting/) specifications.

Comment on lines +122 to +123
- it is not applicable to that particular field (e.g. it is a negative control and the field does not apply)
- Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- it is not applicable to that particular field (e.g. it is a negative control and the field does not apply)
- Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing.
- it is not applicable to that particular field (e.g. it is a negative control and the field does not apply)
- Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing.
Each of these cases have specific ways of encoding 'missingness' as per the INSDC guidelines.

- it can not be shared due to data agreement restrictions;
- it is not applicable to that particular field (e.g. it is a negative control and the field does not apply)
- Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing.
- There are three levels at which you can report missing data, with an increasing amount of specificity for each: _**top level**_, _**lower level**_, and _**reporting level**_. Be as specific/granular as possible when reporting missing values. The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would break up the three level descriptions into bullets

- Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ".
- If using terms from the most granular level (_reporting level_), then exclude the _lower level_ term, as each _reporting level_ term is a "child" of the _lower level_, which can then be inferred based on the [table](https://www.insdc.org/technical-specifications/missing-value-reporting/).

### Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Examples
### Examples

- If using terms from the most granular level (_reporting level_), then exclude the _lower level_ term, as each _reporting level_ term is a "child" of the _lower level_, which can then be inferred based on the [table](https://www.insdc.org/technical-specifications/missing-value-reporting/).

### Examples
- missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's provide context for each example:

  • How to encode missing info for a negative control (not applicable)
  • How to encode missing info because the infromation was not collected during a historical sampling event, or the collection records burnt down
  • How to encode missing information because native/indigenous groups do not permit sharing of this information

- it can not be shared for privacy reasons;
- it can not be shared due to data agreement restrictions;
- it is not applicable to that particular field (e.g. it is a negative control and the field does not apply)
- Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mandatory fields sentence should go at the beginning of this section in a preamble paragraph and say more along the lines of:

Fields in MIxS that are mandatory always require something filled into the given metadata entry. If you do not have this information, you must encode this using the specific 'missing information' categories as below.

While optional fields in MIxS can be left blank, if you have a specific reason the information will never be able to be reported (see examples below), then it is good to use these missing data categories there also.

- There are three levels at which you can report missing data, with an increasing amount of specificity for each: _**top level**_, _**lower level**_, and _**reporting level**_. Be as specific/granular as possible when reporting missing values. The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing.
- Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ".
- If using terms from the most granular level (_reporting level_), then exclude the _lower level_ term, as each _reporting level_ term is a "child" of the _lower level_, which can then be inferred based on the [table](https://www.insdc.org/technical-specifications/missing-value-reporting/).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that the way that you can use the missing data categories in a particular metadata entry will depend on the implementer of the MIxS-MInAS standard.
For example, in some implementations, numeric-only metadata terms may not allow non-number characters and thus will fail validation when giving e.g. `not applicable: control sample` category.
In these cases, refer to the documentation of the place you are submitting your metadata to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants