Enrich HAL papers with DOIs via CrossRef title search

## Problem

HAL provides 0% DOI coverage — in a recent 133K-paper collection, all 5105 HAL papers had no DOI. Papers without DOI cannot get citation counts and bypass citation filtering entirely, reducing the quality of the aggregated output.

## Proposed Solution

Add a DOI enrichment step for HAL papers during aggregation:

1. For each HAL paper missing a DOI, query the CrossRef API with title + first author
2. Use fuzzy matching (threshold ~90%) to validate the returned DOI matches the original paper
3. Write the recovered DOI back into the aggregated data before citation fetching

## Expected Impact

- **Citation coverage**: HAL papers would participate in citation filtering instead of getting a free pass
- **Deduplication**: More HAL papers would match against papers from other APIs (DOI is the primary dedup key)
- **Quality**: Better relevance ranking since citation scores would be available

## Technical Notes

- CrossRef `/works` endpoint supports `query.title` and `query.author` parameters
- Rate limit: ~3 req/sec without polite pool, ~10 req/sec with `mailto` configured
- Could reuse existing CrossRef infrastructure in `scilex/citations/citations_tools.py`
- Should be optional (config flag) since it adds API calls during aggregation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich HAL papers with DOIs via CrossRef title search #36

Problem

Proposed Solution

Expected Impact

Technical Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enrich HAL papers with DOIs via CrossRef title search #36

Description

Problem

Proposed Solution

Expected Impact

Technical Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions