Skip to content

feat: canonical citation count harmonization across APIs #47

@BenjaminNavet

Description

@BenjaminNavet

Description

Different APIs (CrossRef, Semantic Scholar, OpenAlex) report different citation counts for the same DOI. Add a canonical_citation_count field resolved via a fixed priority: Semantic Scholar > OpenAlex > CrossRef > raw collected value. Log discrepancies above a configurable threshold.

Justification

The current 4-tier citation resolution is non-deterministic — the same paper may pass or fail the citation filter depending on which API responded first, making results non-reproducible across re-runs. A deterministic, auditable citation count is a prerequisite for systematic review reproducibility.

Affected files

  • scilex/citations/citations_tools.py
  • scilex/config_defaults.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    data-qualityData quality, deduplication, metadata completenessenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions