-
Notifications
You must be signed in to change notification settings - Fork 0
Fix path traversal vulnerability in PDF downloader #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…F downloader Security fixes: - Add path traversal protection in tar extraction (_is_safe_tar_member) - Add ReDoS protection with bounded regex patterns and match limits - Remove hardcoded contact email from User-Agent (now configurable) - Add PMCID bounds validation to prevent integer overflow - Add range validation to prevent infinite loop in overlap checking Code quality improvements: - Add context manager support for HTTP session cleanup - Improve progress logging efficiency using byte thresholds - Move magic numbers and hardcoded paths to named constants - Add specific exception handling in CLI (network, permission, filesystem) - Organize configuration constants with clear documentation New features: - Add contact_email parameter for configurable User-Agent - Add lazy session creation for better resource management - Add close() method for explicit resource cleanup Tests: - Add comprehensive unit tests for security fixes - Cover path traversal, PMCID validation, range checking, session cleanup Addresses issues from PR #226 security review.
- Fix unbalanced parentheses in User-Agent string construction - Simplify _is_safe_tar_member to check path components correctly (don't use resolve() on tar member paths which aren't real filesystem paths) - Update tests to match corrected logic (allow '..' in filenames, block only '..' as path component) - Add test for balanced parentheses in User-Agent
|
Claude finished @hherb's task —— View job 🔍 Security Code Review - Pull Request #227I've completed a comprehensive review of this security-focused pull request. The changes address critical vulnerabilities while maintaining code quality and functionality. Review Checklist:
✅ Excellent Security Fixes1. Path Traversal Protection (CRITICAL) ✅The
Security Assessment: STRONG 🛡️ 2. ReDoS Protection (MEDIUM) ✅Regex patterns are properly secured:
Security Assessment: GOOD 🛡️ 3. Input Validation ✅PMCID validation is comprehensive:
✅ Code Quality Improvements1. Resource Management ✅HTTP session handling is well-implemented:
2. Constants and Configuration ✅Excellent adherence to "golden rules":
3. Error Handling ✅Robust exception handling:
✅ Outstanding Test CoverageThe test suite ( Security Testing ✅
Functional Testing ✅
Test Quality: EXCELLENT 📋 ✅ Performance ConsiderationsOptimizations ✅
Rate Limiting ✅
🔍 Minor ObservationsCode Style ✅
Architecture ✅
🚀 RecommendationsImmediate Actions
Future Enhancements (not required for this PR)
📊 Overall Assessment
🎯 Verdict: APPROVEThis PR successfully addresses critical security vulnerabilities while maintaining excellent code quality. The path traversal fix alone makes this a high-priority security update. The comprehensive test coverage and adherence to project standards demonstrate professional development practices. Security Impact: Fixes CVE-level path traversal vulnerability |
Summary
Security Fixes
1. Path Traversal Vulnerability (CRITICAL)
Added
_is_safe_tar_member()method that validates archive paths before extraction:/etc/passwd)../../../etc/passwd)2. ReDoS Protection (MEDIUM)
\d+with bounded\d{1,9}in regex patternsMAX_REGEX_MATCHES = 10000limit to prevent CPU exhaustion3. Input Validation
MIN_PMCIDandMAX_PMCIDbounds_get_pmcid_subdir()to prevent integer overflow4. Resource Management
with EuropePMCPDFDownloader(...) as downloader:)close()methodCode Quality Improvements
DEFAULT_OUTPUT_DIRconstantTest Coverage
Added comprehensive unit tests for:
Bug Fixes Found During Review
_is_safe_tar_member()to correctly check path componentsTest Plan
uv run python -m pytest tests/test_europe_pmc_pdf_downloader.py -vlist,download,status,verify,extract,find