Systematic literature search across 12 academic APIs
SciLEx — like the silex stone that early humans relied on to spark fire from raw material — is a lightweight, portable tool designed to ignite research exploration. Rather than navigating fragmented databases, confronting redundant results, and manually sifting through noise, SciLEx strikes directly at the core challenge: it queries heterogeneous digital library APIs, applies smart deduplication and quality filtering, and delivers a clean, curated corpus ready for export to Zotero or BibTeX. It is not a full-scale review platform — it is the essential flint in the researcher's toolkit, engineered to quick-start systematic literature reviews with precision and minimal friction.
SciLEx (Science Literature Exploration) is a Python toolkit for systematic literature reviews. It crawls 12 academic APIs in parallel, deduplicates results using DOI-based and normalized title exact matching, and applies a 5-phase quality filtering pipeline before exporting to Zotero or BibTeX.
- Multi-API collection with parallel processing (12 academic APIs)
- Smart deduplication using DOI and normalized title matching
- 5-phase quality filtering pipeline with time-aware citation thresholds:
- ItemType Filter — whitelist by publication type (journal, conference, etc.)
- Quality Filter — require DOI, abstract, year, minimum author count, optional open-access
- Abstract Quality Filter — remove placeholder or low-quality abstracts
- Citation Filter — time-aware thresholds (e.g. ≥1 citation after 18 months, ≥10 after 3 years)
- Relevance Ranking — composite score (0–10) from keyword density, metadata completeness, venue type, and citation impact
- Citation count enrichment via CrossRef, OpenCitations, OpenAlex and Semantic Scholar
- HuggingFace enrichment: query HuggingFace Hub to retrieve associated ML models, datasets, and GitHub repositories
- Export to Zotero (bulk upload) or BibTeX (with PDF links)
- Idempotent collections for safe re-runs
| API | Key Required | Coverage | Best For |
|---|---|---|---|
| SemanticScholar | Optional | 200M+ papers | CS/AI papers, citation networks |
| OpenAlex | Optional | 250M+ works | Broad coverage, ORCID data |
| IEEE | Yes | 5M+ docs | Engineering, CS conferences |
| Arxiv | No | 2M+ preprints | Preprints, physics, CS |
| Springer | Yes | 13M+ docs | Journals, books |
| Elsevier | Yes | 18M+ articles | Medical, life sciences |
| PubMed | Optional | 35M+ papers | Biomedical literature |
| HAL | No | 1M+ docs | French research, theses |
| DBLP | No | 6M+ CS papers | CS bibliography, 95%+ DOI |
| Istex | No | 25M+ docs | French institutional access |
| OpenAIRE | No | 200M+ docs | Open-access, EU research |
| ORKG | No | 55K papers | Structured CS research comparisons |
See the API Comparison for rate limits, coverage details, and limitations.
└── SciLEx/
├── README.md
├── pyproject.toml
├── CONTRIBUTING.md
├── .env.example
├── scilex/
│ ├── run_collection.py
│ ├── aggregate_collect.py
│ ├── enrich_with_hf.py
│ ├── push_to_zotero.py
│ ├── export_to_bibtex.py
│ ├── quality_validation.py
│ ├── keyword_validation.py
│ ├── abstract_validation.py
│ ├── duplicate_tracking.py
│ ├── crawlers/
│ ├── citations/
│ ├── Zotero/
│ ├── HuggingFace/
│ └── tagging/
├── tests/
├── docs/
└── img/- uv (recommended) or pip
- Clone the repository:
❯ git clone https://github.com/Wimmics/SciLEx- Navigate to the project directory:
❯ cd SciLEx- Install dependencies:
❯ uv syncUsing pip:
❯ pip install -e .
# With dev dependencies (pytest, ruff, coverage)
❯ pip install -e ".[dev]"Copy the example config files and fill in your API keys:
❯ cp scilex/api.config.yml.example scilex/api.config.yml
❯ cp scilex/scilex.config.yml.example scilex/scilex.config.yml
❯ cp scilex/scilex.advanced.yml.example scilex/scilex.advanced.ymlSee the Configuration Guide for all available settings.
Option A — with environment activation:
❯ source .venv/bin/activate # macOS/Linux
❯ .venv\Scripts\activate # Windows
❯ scilex-collectOption B — with uv run (no activation needed):
❯ uv run scilex-collectRun the pipeline step by step:
# 1. Collect papers from all configured APIs
❯ scilex-collect
# 2. Deduplicate and apply quality filtering
❯ scilex-aggregate
# 3. (Optional) Enrich with HuggingFace metadata
❯ scilex-enrich
# 4. Export results
❯ scilex-push-zotero # Push to a Zotero collection
❯ scilex-export-bibtex # Export to BibTeXSee the Quick Start Guide for a complete walkthrough.
Option C - 💻 Web Interface [BETA VERSION]
Install dev dependencies first (not included in the default install):
❯ uv sync --extra devThen run the tests:
❯ uv run python -m pytest tests/ -v # All tests
❯ uv run python -m pytest tests/ --cov=scilex --cov-report=term-missing # With coverage
❯ uv run python -m pytest tests/ -v -m "not live" # Offline tests onlyIf you use SciLEx in your research, please cite:
Full text:
Célian Ringwald, Benjamin Navet. SciLEx, Science Literature Exploration Toolkit ⟨swh:1:dir:944639eb0260a034a5cbf8766d5ee9b74ca85330⟩.
BibTeX:
@softwareversion{scilex2026,
TITLE = {{SciLEx, Science Literature Exploration Toolkit}},
AUTHOR = {Ringwald, Célian and Navet, Benjamin},
URL = {https://github.com/Wimmics/SciLEx},
NOTE = {},
INSTITUTION = {{University Côte d'Azur ; CNRS ; Inria}},
YEAR = {2026},
MONTH = Fev,
SWHID = {swh:1:dir:944639eb0260a034a5cbf8766d5ee9b74ca85330},
VERSION = {1.0},
REPOSITORY = {https://github.com/Wimmics/SciLEx},
LICENSE = {MIT Licence},
KEYWORDS = {Python, Scientific literature, literature research, paper retrieval},
HAL_ID = {},
HAL_VERSION = {},
}- Report issues: GitHub Issues
- See CONTRIBUTING.md for development guidelines
This project is protected under the MIT License. For more details, refer to the LICENSE file.


