A complete, user-friendly application to work with PDF files. Convert, edit, merge, split, compress, and protect your PDF documents easily!
- ✅ Convert PDF to Word (DOCX)
- ✅ Convert PDF to Excel (XLSX)
- ✅ Convert PDF to PowerPoint (PPTX)
- ✅ Convert PDF to HTML
- ✅ Convert PDF to Images (PNG, JPG)
- ✅ Convert PDF to Text
- ✅ Merge multiple PDFs into one
- ✅ Split PDF into separate pages
- ✅ Compress PDF (reduce file size)
- ✅ Extract text and images
- ✅ Rotate pages
- ✅ Extract specific pages
- ✅ Add password protection
- ✅ Encrypt PDF files
- ✅ Add watermarks
- ✅ Add digital signatures
- ✅ Remove passwords (if authorized)
- ✅ Batch processing (process multiple files)
- ✅ REST API for developers
- ✅ Web interface (user-friendly UI)
- ✅ Background task processing
- ✅ File upload/download management
- Python 3.10 or higher
- MySQL Server
- Node.js (optional, for frontend)
-
Clone the repository:
git clone https://github.com/yourusername/PDF-Tool.git cd PDF-Tool -
Create virtual environment:
python -m venv pdf_tool_env source pdf_tool_env/bin/activate # On Windows: pdf_tool_env\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Setup configuration:
cp .env.example .env # Edit .env with your settings (MySQL connection, etc.) -
Initialize database:
python >>> from app import app, db >>> with app.app_context(): ... db.create_all() >>> exit()
-
Run the application:
python run.py
Visit: http://localhost:5000
- Features
- Quick Start
- Installation Guide
- Project Structure
- Usage
- API Documentation
- Configuration
- Security
- Troubleshooting
- Contributing
- License
See README_SETUP.md for complete step-by-step installation guide with detailed explanations.
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.11+ |
| MySQL | 5.7 | 8.0+ |
| RAM | 2GB | 4GB+ |
| Disk Space | 500MB | 2GB+ |
| OS | Windows/Mac/Linux | Windows/Mac/Linux |
The project includes all necessary packages:
pip install -r requirements.txtMain packages:
- Flask - Web framework
- SQLAlchemy - Database ORM
- PyMuPDF - PDF processing
- pypdf - PDF manipulation
- Celery - Background tasks
- Redis - Task queue
PDF-Tool/
│
├── app.py # Main Flask application
├── api.py # REST API endpoints
├── config.py # Configuration settings
├── run.py # Entry point to start app
├── requirements.txt # Python dependencies
│
├── common/ # Shared utilities
│ ├── blob_storage.py # Database file storage
│ ├── conversion_qa.py # Quality assurance
│ ├── dependency_checker.py # Check dependencies
│ ├── error_recovery.py # Error handling
│ ├── exceptions.py # Custom exceptions
│ ├── file_processing.py # File operations
│ ├── file_validation.py # Validation logic
│ ├── health_check.py # System health
│ ├── model_validation.py # Database validation
│ ├── progress.py # Progress tracking
│ ├── subprocess_utils.py # Subprocess handling
│ └── upload_handler.py # Upload management
│
├── Feature/ # Application features
│ ├── admin_features.py # Admin operations
│ ├── authentication_features.py # User auth
│ ├── conversion_features.py # Conversion logic
│ ├── file_management_features.py
│ ├── dashboard_features.py # Dashboard UI
│ ├── error_handling.py # Error handlers
│ └── feature_manager.py # Feature management
│
├── pdf_modules/ # PDF operations
│ ├── pdf_base.py # Base PDF class
│ ├── pdf_convert.py # Conversion operations
│ ├── pdf_merge_split.py # Merge and split
│ ├── pdf_compress.py # Compression
│ ├── pdf_security.py # Password protection
│ ├── pdf_validation.py # PDF validation
│ ├── pdf_transform.py # Transform operations
│ ├── pdf_edit.py # Edit operations
│ ├── pdf_repair.py # Repair PDFs
│ └── ... (more modules)
│
├── database/ # Database models
│ ├── models_user.py # User model
│ ├── models_file.py # File model
│ ├── models_tracking.py # Tracking model
│ ├── db.py # Database setup
│ └── pdf_tool_project.db # Database file
│
├── static/ # Frontend assets
│ ├── css/ # Stylesheets
│ ├── js/ # JavaScript files
│ ├── images/ # Images
│ └── index.html # Main HTML
│
├── templates/ # HTML templates
│ ├── base.html # Base template
│ ├── index.html # Home page
│ ├── dashboard.html # User dashboard
│ └── ...
│
├── uploads/ # ⚠️ User uploaded files (not committed)
├── processed/ # ⚠️ Output files (not committed)
├── .env.example # Example config (safe to commit)
├── .env # ⚠️ Your config (NOT committed)
├── .gitignore # Git ignore rules
├── README.md # This file
├── README_SETUP.md # Detailed setup guide
└── LICENSE # MIT License
- Open http://localhost:5000 in your browser
- Choose your PDF operation from the menu
- Upload your PDF file
- Configure settings (if needed)
- Click "Process" button
- Download the result
from pdf_modules.pdf_convert import PDFConverter
# Convert PDF to Word
converter = PDFConverter()
converter.pdf_to_word(
input_path="document.pdf",
output_path="document.docx"
)
print("Conversion complete!")Convert PDF to Word:
curl -X POST http://localhost:5000/api/pdf/convert-to-word \
-F "file=@input.pdf" \
-H "Content-Type: multipart/form-data"Merge PDFs:
curl -X POST http://localhost:5000/api/pdf/merge \
-F "files=@file1.pdf" \
-F "files=@file2.pdf"See API Documentation for more endpoints.
POST /api/pdf/convert-to-word- Convert to Word (DOCX)POST /api/pdf/convert-to-excel- Convert to Excel (XLSX)POST /api/pdf/convert-to-pptx- Convert to PowerPointPOST /api/pdf/convert-to-html- Convert to HTMLPOST /api/pdf/convert-to-text- Extract text
POST /api/pdf/merge- Merge multiple PDFsPOST /api/pdf/split- Split PDF into pagesPOST /api/pdf/compress- Compress PDF filePOST /api/pdf/rotate- Rotate pagesGET /api/pdf/extract-images- Extract images
POST /api/pdf/protect- Add password protectionPOST /api/pdf/encrypt- Encrypt filePOST /api/pdf/watermark- Add watermark
GET /tasks/{task_id}- Get task statusGET /tasks/{task_id}/result- Get task result
Success Response:
{
"status": "success",
"message": "Operation completed",
"task_id": "conv_abc123def456",
"result": {
"output_file": "output.docx",
"file_size": 1024000
}
}Error Response:
{
"status": "error",
"message": "Invalid PDF file",
"error_code": "INVALID_PDF"
}Create .env file from .env.example:
cp .env.example .envFlask Settings:
SECRET_KEY=your-secret-key-here
FLASK_ENV=development
FLASK_DEBUG=1
Database:
DATABASE_URL=mysql+pymysql://root:password@localhost:3307/pdf_tool_project
File Storage:
UPLOAD_FOLDER=uploads
PROCESSED_FOLDER=processed
MAX_CONTENT_LENGTH=1073741824 # 1GB
Redis (for background tasks):
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0
Email (for notifications):
MAIL_SERVER=smtp.gmail.com
MAIL_PORT=587
MAIL_USERNAME=your-email@gmail.com
MAIL_PASSWORD=your-password
The .env file contains your personal credentials and secrets. Always:
✅ DO:
- Keep
.envonly on your computer - Use
.env.exampleas template - Change default passwords
- Use strong
SECRET_KEYin production
❌ DON'T:
- Commit
.envto Git/GitHub - Share
.envfile - Use weak passwords
- Leave debug mode on in production
The application uses industry-standard encryption:
- User passwords: bcrypt hashing
- PDF passwords: AES-256 encryption
- API tokens: Secure random generation
- Uploaded files stored securely
- Automatic cleanup of old files
- Virus/malware scanning (if enabled)
- Access control by user
Error: No module named 'flask'
Solution:
# Activate virtual environment
source pdf_tool_env/bin/activate # On Windows: pdf_tool_env\Scripts\activate
# Install requirements
pip install -r requirements.txtError: Can't connect to MySQL server
Solution:
- Check MySQL is running
- Verify connection in
.envfile - Check username/password
- Verify port (default: 3306 or 3307)
Error: Address already in use
Solution:
Change port in run.py:
if __name__ == '__main__':
app.run(debug=True, port=5001)Error: 413 Request Entity Too Large
Solution:
Increase limit in .env:
MAX_CONTENT_LENGTH=2147483648 # 2GB
Error: Conversion process failed
Solution:
- Check file is valid PDF
- Check disk space
- Check file permissions
- View logs for details
# Check Flask logs
tail -f logs/app.log
# Check Celery logs
tail -f logs/celery.logWe welcome contributions! Here's how to help:
- Fork the repository
- Create a new branch:
git checkout -b feature/your-feature - Make your changes
- Test thoroughly
- Commit:
git commit -m "Description of changes" - Push:
git push origin feature/your-feature - Create Pull Request on GitHub
- Follow PEP 8 style guide
- Add comments to complex code
- Test your changes
- Don't commit
.envfile - Don't commit credentials
This project is licensed under the MIT License. See LICENSE file for details.
-
Check Documentation
- Read README_SETUP.md for setup help
- Check API documentation at
/docsendpoint - Review code comments in source files
-
Search Issues
- Visit GitHub Issues page
- Search for your problem
- Read solutions from others
-
Create an Issue
- Click "New Issue"
- Describe your problem clearly
- Include error messages
- Provide steps to reproduce
-
Contact
- Email: support@example.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Cloud storage integration (Google Drive, Dropbox)
- Advanced OCR capabilities
- AI-powered document analysis
- Batch processing UI improvements
- Mobile app
- Docker support
- CI/CD pipeline
- Better error messages
- Performance optimization
- More file format support
- Enhanced security features
- Language: Python 3.10+
- Database: MySQL
- Web Framework: Flask
- Frontend: HTML/CSS/JavaScript
- License: MIT
- Status: Active Development
Thanks to all contributors and users who help improve this project!
Made with ❤️ for PDF lovers worldwide
Last Updated: October 25, 2025
- 🏠 Home
- 📖 Setup Guide
- 🐛 Report Issues
- ⭐ Star on GitHub
- 💬 Discussions