Security Policy

Supported Versions

Version	Supported
1.0.x	✅
< 1.0	❌

Security Model

Privacy-First Architecture

PDF Content Extractor & Translator is designed with data sovereignty as a core principle:

┌─────────────────────────────────────────────────────────────┐
│                     YOUR MACHINE                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ PDF Files   │→ │ Application │→ │ Processed Output    │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
│                          ↓                                  │
│                   ┌─────────────┐                          │
│                   │ Local AI    │                          │
│                   │ (Ollama)    │                          │
│                   └─────────────┘                          │
└─────────────────────────────────────────────────────────────┘
                          ║
                          ╳ NO EXTERNAL CONNECTIONS
                          ║
┌─────────────────────────────────────────────────────────────┐
│                     INTERNET                                │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │ Cloud APIs  │  │ Analytics   │  │ Third-party Services│ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Guarantees:

✅ All PDF processing happens locally
✅ No telemetry or analytics collection
✅ No data transmitted to external servers
✅ No API keys required for any feature
✅ Works completely offline after initial setup

Reporting a Vulnerability

We take security seriously. If you discover a vulnerability, please follow responsible disclosure:

🔒 Private Disclosure (Preferred)

For sensitive security issues:

Email: security@[project-domain].com (or maintainer email)
Subject: [SECURITY] Brief description
Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if any)

Response Timeline:

Acknowledgment: Within 48 hours
Initial assessment: Within 7 days
Resolution target: Within 30 days (severity-dependent)

🔓 Public Disclosure

For non-sensitive issues:

Open a GitHub Issue
Label it with security
Provide detailed reproduction steps

Security Measures

Input Validation

Control	Implementation	Location
Filename Sanitization	`werkzeug.secure_filename()`	`app.py`
File Type Validation	PDF magic bytes check	`is_valid_file()`
Path Traversal Prevention	Allowed directory whitelist	`mcp_server.py`
Size Limits	Configurable upload limits	Flask config
Numeric Input Validation	Type casting with error handling	API endpoints

File Handling

# Example: Path validation
ALLOWED_DIRECTORIES = [
    Path.home() / "Documents",
    Path.home() / "Downloads",
    Path.home() / "Desktop",
    Path(__file__).parent.resolve()
]

def validate_path(file_path: str) -> bool:
    resolved = Path(file_path).resolve()
    return any(
        resolved.is_relative_to(allowed) 
        for allowed in ALLOWED_DIRECTORIES
    )

Network Isolation

Component	Binding	External Access
Flask App	`localhost:5000`	❌ None by default
Redis	`localhost:6379`	❌ Local only
Ollama	`localhost:11434`	❌ Local only
ChromaDB	Embedded	❌ No network

Sensitive Data

Data Type	Storage	Lifecycle
Uploaded PDFs	`uploads/`	User-managed
Processed files	`outputs/`	User-managed
AI embeddings	`chroma_db/`	Persistent
Logs	`logs/`	Auto-rotated

Recommendation: Implement regular cleanup:

# Clear files older than 7 days
find uploads/ outputs/ -type f -mtime +7 -delete

Known Security Considerations

Current Limitations

Area	Status	Notes
Authentication	❌ None	Designed for single-user local use
Authorization	❌ None	All users have full access
Encryption at Rest	❌ None	Files stored unencrypted
HTTPS	❌ Not built-in	Use reverse proxy for production
Rate Limiting	❌ None	Add via reverse proxy

Recommendations for Production Deployment

If exposing to a network:

Use a Reverse Proxy:

server {
    listen 443 ssl;
    server_name pdf.example.com;
    
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;
    
    location / {
        proxy_pass http://localhost:5000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Add Authentication:
- OAuth2 Proxy
- HTTP Basic Auth
- Flask-Login

Enable Rate Limiting:

limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

location /api/ {
    limit_req zone=api burst=20;
    proxy_pass http://localhost:5000;
}

Restrict File Uploads:

app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024  # 50MB

Compliance

Applicable Regulations

Regulation	Applicability	Status
GDPR	EU users	✅ Compliant (no personal data leaves device)
HIPAA	Healthcare	✅ Suitable (self-hosted, no PHI transmission)
CCPA	California	✅ Compliant (no data collection)
SOC 2	Enterprise	⚠️ N/A (not a cloud service)

Data Processing

Data Controller: The user operating the application
Data Processor: N/A (no third-party processing)
Data Retention: User-controlled (no automatic deletion)
Data Export: All files accessible in filesystem

Dependency Security

Monitoring

We use the following tools to monitor dependencies:

Dependabot: Automated vulnerability alerts
pip-audit: Python package vulnerability scanning
npm audit: JavaScript dependency scanning

Update Policy

Severity	Response Time
Critical	Within 24 hours
High	Within 7 days
Medium	Next release cycle
Low	Best effort

Running Security Scans

# Python dependencies
pip install pip-audit
pip-audit

# Check for known vulnerabilities
safety check -r requirements.txt

Security Checklist for Contributors

Before submitting a PR, ensure:

No hardcoded secrets or credentials
All user input is validated
File paths use secure_filename()
No new external network calls introduced
Error messages don't leak sensitive information
Logging doesn't include sensitive data
New dependencies are reviewed for vulnerabilities

Bug Bounty

We currently do not have a formal bug bounty program. However, we deeply appreciate security researchers who help improve our security posture. Responsible disclosure will be acknowledged in our release notes and contributors list.

Contact

Security Issues: Create private advisory
General Issues: GitHub Issues

Last updated: 2025-12-20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

SECURITY.md

Security Policy

Supported Versions

Security Model

Privacy-First Architecture

Reporting a Vulnerability

🔒 Private Disclosure (Preferred)

🔓 Public Disclosure

Security Measures

Input Validation

File Handling

Network Isolation

Sensitive Data

Known Security Considerations

Current Limitations

Recommendations for Production Deployment

Compliance

Applicable Regulations

Data Processing

Dependency Security

Monitoring

Update Policy

Running Security Scans

Security Checklist for Contributors

Bug Bounty

Contact

There aren’t any published security advisories

Security: matis-dev/pdf-extractor-translator

Security

SECURITY.md

Security Policy

Supported Versions

Security Model

Privacy-First Architecture

Reporting a Vulnerability

🔒 Private Disclosure (Preferred)

🔓 Public Disclosure

Security Measures

Input Validation

File Handling

Network Isolation

Sensitive Data

Known Security Considerations

Current Limitations

Recommendations for Production Deployment

Compliance

Applicable Regulations

Data Processing

Dependency Security

Monitoring

Update Policy

Running Security Scans

Security Checklist for Contributors

Bug Bounty

Contact

There aren’t any published security advisories