-
Notifications
You must be signed in to change notification settings - Fork 1
Description
During the last weeks I opened many issues and questions about gnverifier / gnames / resolver / gnparser / gndiff , trying to tune them for my use cases ... which I hope are similar to those of many other users.
As long as those issues are solved, the results returned by apis are improving.
But improving also implies "changing". And I think this is a major issue for some use cases as well.
When it comes to publishing scientific results (thesis, reports, papers, whatever), repeatability is a must.
If I need to publish a curated list of scientific names, I can describe my protocol (i.e. #85), and I can also provide my data sources as attached files ... but there is no way I can provide the software I used to process those data following my protocol, because it was an api running on a remote server.
And there is no way to change this, since old apis and servers need to be removed. And their names datasources need to be updated, so even the same api version might return different results because of those names updates.
On the other hand, as far as I can tell, if I do my work using a particular release version of gnparser and gndiff, my results will be 100% repeatable in the future as they work completely offline, am I right?
I am currently using online resolver/verifier for several use cases.
For many of them a changing and up to date online api is the best option (i.e. daily checking names of new specimens entering in a collection).
But for published works, a protocol where I download a given version of a datasource and process it offline is a much better option.
In this sense, I see gndiff+gnparser as the most important gnames' tools for scientific publications.
I open this issue not only for encouraging you to further develop them, but also to raise the question about what to do (as of today) if I want to publish some work and describe a protocol which was based on results returned by a current or past gnames api version.
Is there any way of citing "I used this version of gnverifier api" and also provide some kind of link (github? edit: seen a couple of Zenodo links cited here) which exactly reflects its code at that time ... so whoever wants to repeat my results in the future do it can download the exact version from github and install everything needed to repeat my work? i.e., an exact replica of gnames services at a given moment in time (of course, given that I also downloaded the current gnames database dump at that time, and stored it in a permanent repository somewhere, and provided a link in my publication).
I know that would imply a lot of work and nobody would take the time to do that in practice. But in theory, would it be possible?
As of today, can we state that a work which used gnverifier api results is "in theory" repeatable in 10 years from now, or is this not possible?
And if it is, I would suggest not only to document the how-to, but also that apis could somehow return the how-to info if we request it (some sort of citation parameters, providing necessary links to github, db dumps, etc).