Reusable utility scripts organized by function.
Scripts for validating DOI resolution and finding corrections:
validate_failed_dois.py- Check if DOIs resolve (200/403/404 status)validate_new_corrections.py- Validate correction candidates before applyingfind_correct_dois.py- Research and find correct DOI alternativesgenerate_search_links.py- Generate search links for manual DOI lookup
Scripts for applying validated corrections to the CSV:
apply_doi_corrections.py- Apply validated corrections from YAML definitionapply_additional_corrections.py- Apply batch correctionsclean_invalid_dois.py- Remove or clean invalid DOIs from CSVmanual_doi_correction.py- Interactive tool for manual correction
Scripts for downloading citation PDFs and abstracts:
download_all_pdfs_automated.py- Automated PDF download for all DOIsdownload_all_csv_pdfs.py- Download PDFs based on CSV DOI columnsdownload_corrected_dois_pdfs.py- Download PDFs for corrected DOIsretry_failed_dois_with_fallbackpdf.py- Use fallback PDF service for failed downloadstest_fallbackpdf_integration.py- Test fallback PDF integrationdebug_fallbackpdf_html.py- Debug fallback PDF HTML responses
Scripts for enriching ingredient data:
enrich_ingredient_effects.py- Enrich ingredient properties with additional datarun_enrichment_cleaned.py- Run enrichment on cleaned CSV
Scripts for schema and CSV structure modifications:
add_role_columns.py- Add organism/role context columns to CSVmigrate_schema.py- Migrate LinkML schema to new versions
All scripts use uv for dependency management:
# Run a validation script
uv run python scripts/doi_validation/validate_failed_dois.py
# Apply corrections
uv run python scripts/doi_corrections/apply_doi_corrections.py
# Download PDFs
uv run python scripts/pdf_downloads/download_all_pdfs_automated.py- Correction Definitions:
../data/corrections/ - Results/Logs:
../data/results/ - Documentation:
../notes/ - Project Status:
../docs/STATUS.md