-
Notifications
You must be signed in to change notification settings - Fork 2
Description
In the new DiVA system regexps are used to verify the content in some fields. Here are the expressions that most likely will be used:
DOI: ^(10[.])[0-9]{4}[\S]+$
ISBN: ^((?:[\dX]{13})|(?:[\d-X]{17})|(?:[\dX]{10})|(?:[\d-X]{13}))$
ISSN: ^(\d{4})-(\d{3}[0-9X])$
WOS: (^\d+$)
SCOPUS: ^(2-s2.0-)(\d+)$
PMID: ^(\d{1,8})(|.\d{1,3})$
It would be useful to check the present DiVA content satisfying at least these expressions. If we have other and more strict rules we can add them on top of the above. We know that there are a number of old WOS id that begin with a letter (usually "A" followed by numbers), example https://www.webofscience.com/wos/woscc/full-record/WOS:A1986A067700004 We will probably have to write our own regexp for that.
This could be used: ^A\d{4}[A-Za-z]{1,2}\d+$
Another thing that eventually will be needed is to extend the time frame to 1066-2026. As all publications in DiVA will be migrated, we have to clean them all. I guess that could be implemented immediately, and that we make sure that the default sorting is "new-to-old".