-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
After aggregation, add an optional enrichment step that queries the Unpaywall API (via the unpywall Python package) for each paper's DOI and appends an oa_type column with values: gold, green, bronze, or closed.
For papers already carrying a pdf_url (e.g. from OpenAlex), the OA type complements it: gold means published in a fully OA journal; green means a repository copy (preprint or accepted manuscript); bronze means free to read on the publisher site but without an explicit open license. The primary value is enriching papers collected from non-OpenAlex sources (IEEE, Springer, Elsevier) where no OA metadata is currently available.
Justification
Unpaywall indexes 50M+ open-access DOIs via a free REST API (email only, no key required). The oa_type dimension is actionable for compliance: many funding agencies require gold OA specifically. The unpywall package (MIT license) wraps the API with pandas integration and supports per-DOI caching.
Affected files
- new
scilex/oa_enrichment.py pyproject.toml(optional dep:unpywall)