forked from datalogism/SciLEx
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
data-qualityData quality, deduplication, metadata completenessData quality, deduplication, metadata completenessenhancementNew feature or requestNew feature or request
Description
Description
Different APIs (CrossRef, Semantic Scholar, OpenAlex) report different citation counts for the same DOI. Add a canonical_citation_count field resolved via a fixed priority: Semantic Scholar > OpenAlex > CrossRef > raw collected value. Log discrepancies above a configurable threshold.
Justification
The current 4-tier citation resolution is non-deterministic — the same paper may pass or fail the citation filter depending on which API responded first, making results non-reproducible across re-runs. A deterministic, auditable citation count is a prerequisite for systematic review reproducibility.
Affected files
scilex/citations/citations_tools.pyscilex/config_defaults.py
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
data-qualityData quality, deduplication, metadata completenessData quality, deduplication, metadata completenessenhancementNew feature or requestNew feature or request