Skip to content

As a curator, I want to more easily add metadata about related resources so that it's more discoverable #5277

@jggautier

Description

@jggautier

While working on exporting more dataset metadata in Schema.org (#4371), the team talked about the best way to include the URLs of related publications. (In Schema.org, it's best practice to use a URL, instead of an ID number, for related publications.) But not all depositors/curators enter related publication metadata the way we expect (the way the fields have been designed), which has and will lead to significant curation work to correct the metadata so that it's exported in different metadata formats (I'm thinking of Schema.org and DataCite).

This is what the Dataverse 4 relatedPublication compound fields look like:
screen shot 2018-11-05 at 1 00 51 pm

For the Schema.org issue (#4371), we decided to use what's entered in the URL field.

But there are plenty of cases where:

  • what the depositor enters in the URL field is not the URL form of the included DOI or other PID (which is what the picture above shows)
  • nothing is entered in the URL field
  • the URL form of a PID is included, but nothing is entered in the ID Type and/or ID Number fields

So not all possible related publication metadata will be exported, and discoverable in other systems, without curation work to update the metadata after it's been published.

For DataCite schema, Dataverse needs to know what are the ID Types and identifiers of related publications, other datasets and software. For Schema.org, Dataverse needs to know what's the URL of related publications. (As of this issue, there's no recommended way to include in Schema.org metadata about other related datasets and software.)

For other datasets and software, Dataverse doesn't have fields that are meant to collect URLs, ID Types or ID Numbers:

Field for related software used to create the data (this might change with a planned software metadata block):
screen shot 2018-11-05 at 1 30 27 pm

Field for related datasets:
screen shot 2018-11-05 at 1 30 46 pm

Can we improve the metadata fields (such as the number of fields, how they're labelled and described) and/or how Dataverse uses what's entered in metadata fields (such as text parsing, getting more information from external sources) in order to reduce the curation and training needed now to make sure Dataverse is more capable of exposing information about related resources?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Interested

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions