Skip to content

feat: open access status enrichment via Unpaywall #44

@BenjaminNavet

Description

@BenjaminNavet

Description

After aggregation, add an optional enrichment step that queries the Unpaywall API (via the unpywall Python package) for each paper's DOI and appends an oa_type column with values: gold, green, bronze, or closed.

For papers already carrying a pdf_url (e.g. from OpenAlex), the OA type complements it: gold means published in a fully OA journal; green means a repository copy (preprint or accepted manuscript); bronze means free to read on the publisher site but without an explicit open license. The primary value is enriching papers collected from non-OpenAlex sources (IEEE, Springer, Elsevier) where no OA metadata is currently available.

Justification

Unpaywall indexes 50M+ open-access DOIs via a free REST API (email only, no key required). The oa_type dimension is actionable for compliance: many funding agencies require gold OA specifically. The unpywall package (MIT license) wraps the API with pandas integration and supports per-DOI caching.

Affected files

  • new scilex/oa_enrichment.py
  • pyproject.toml (optional dep: unpywall)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestenrichmentPost-collection data enrichment

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions