-
Notifications
You must be signed in to change notification settings - Fork 536
Description
While working on exporting more dataset metadata in Schema.org (#4371), the team talked about the best way to include the URLs of related publications. (In Schema.org, it's best practice to use a URL, instead of an ID number, for related publications.) But not all depositors/curators enter related publication metadata the way we expect (the way the fields have been designed), which has and will lead to significant curation work to correct the metadata so that it's exported in different metadata formats (I'm thinking of Schema.org and DataCite).
This is what the Dataverse 4 relatedPublication compound fields look like:

For the Schema.org issue (#4371), we decided to use what's entered in the URL field.
But there are plenty of cases where:
- what the depositor enters in the URL field is not the URL form of the included DOI or other PID (which is what the picture above shows)
- nothing is entered in the URL field
- the URL form of a PID is included, but nothing is entered in the ID Type and/or ID Number fields
So not all possible related publication metadata will be exported, and discoverable in other systems, without curation work to update the metadata after it's been published.
For DataCite schema, Dataverse needs to know what are the ID Types and identifiers of related publications, other datasets and software. For Schema.org, Dataverse needs to know what's the URL of related publications. (As of this issue, there's no recommended way to include in Schema.org metadata about other related datasets and software.)
For other datasets and software, Dataverse doesn't have fields that are meant to collect URLs, ID Types or ID Numbers:
Field for related software used to create the data (this might change with a planned software metadata block):

Can we improve the metadata fields (such as the number of fields, how they're labelled and described) and/or how Dataverse uses what's entered in metadata fields (such as text parsing, getting more information from external sources) in order to reduce the curation and training needed now to make sure Dataverse is more capable of exposing information about related resources?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status
