Skip to content

feat: author impact enrichment via Semantic Scholar Author API #45

@BenjaminNavet

Description

@BenjaminNavet

Description

After collection, optionally fetch hIndex and citationCount for first and last authors using the Semantic Scholar /author/batch endpoint (up to 1000 authors per request). Append first_author_hindex and last_author_hindex columns to the CSV. Optionally weight author impact in the relevance score.

Justification

The Semantic Scholar Academic Graph API exposes hIndex as a named field on author objects (confirmed fields: authorId, name, affiliations, citationCount, hIndex, paperCount). Author reputation is a standard quality signal in systematic reviews. SciLEx currently stores author names but discards all author-level metadata. Implementation reuses the existing Semantic Scholar client infrastructure.

Affected files

  • new scilex/author_enrichment.py
  • scilex/crawlers/collectors/semantic_scholar.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestenrichmentPost-collection data enrichment

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions