API Duplicate resource detection

### What

The storage system should have some capabilities around detecting when a user is attempting to add a resource that already exists, and/or that two or more resources already in the database are putative duplicates.

### Details

Scenario: User is uploading one or more records (plots, plot observations, taxon observations, projects, references, etc) as though they are _new_ resources in the system. Depending on how we implement upload functionality, this may mean adding a resource without a an identifier (`accessionCode`), or specifying an identifier that does not exist in the system. We would like the system to be able to recognize when this resource already appears to exist in the system, with some (other) identifier, and flag this for the user.

TBD:
- Do we also check for duplication internally in a bulk upload? e.g. user uploads 2 (new) records with different identifiers but the same values of other fields
- What fields should we check for duplication? Do we need to do this on a resource by resource basis?
- To what extent we want to do fuzzy matching -- e.g. do we require plot name to be _identical_ or merely very similar? If the latter, consider that some fields are likely to be very similar across records (e.g. plot "Santa Rosa plot 1" and "Santa Rosa plot 2"), leading to a high false positive rate in duplication detection.
- How many fields have the be the same to qualify as a duplicate? Do we need to (and can we) define a specific and reliable rule on a resource type by resource type basis?
- When we detect duplicates, what do we do?
   - Fail to add the data, and return a corresponding warning message?
   - In that case, how do we allow the user to override duplicate detection? Do we do this on a record by record basis within an upload request, or apply to the entire upload request?
- Do we want to also do periodic duplicate detection on the database as a background job (or have as an API admin method that can be run on demand), flagging potential duplicates for review and cleanup? If so, what cleanup do we offer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Duplicate resource detection #424

What

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

API Duplicate resource detection #424

Description

What

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions