Skip to content

9046balaji/Pdf-Tools

Repository files navigation

📄 PDF Tool - All-in-One PDF Solution

A complete, user-friendly application to work with PDF files. Convert, edit, merge, split, compress, and protect your PDF documents easily!

Python License Status


✨ Features

PDF Conversion

  • ✅ Convert PDF to Word (DOCX)
  • ✅ Convert PDF to Excel (XLSX)
  • ✅ Convert PDF to PowerPoint (PPTX)
  • ✅ Convert PDF to HTML
  • ✅ Convert PDF to Images (PNG, JPG)
  • ✅ Convert PDF to Text

PDF Editing & Processing

  • ✅ Merge multiple PDFs into one
  • ✅ Split PDF into separate pages
  • ✅ Compress PDF (reduce file size)
  • ✅ Extract text and images
  • ✅ Rotate pages
  • ✅ Extract specific pages

PDF Security

  • ✅ Add password protection
  • ✅ Encrypt PDF files
  • ✅ Add watermarks
  • ✅ Add digital signatures
  • ✅ Remove passwords (if authorized)

Additional Features

  • ✅ Batch processing (process multiple files)
  • ✅ REST API for developers
  • ✅ Web interface (user-friendly UI)
  • ✅ Background task processing
  • ✅ File upload/download management

🚀 Quick Start (5 Minutes)

Prerequisites

  • Python 3.10 or higher
  • MySQL Server
  • Node.js (optional, for frontend)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/PDF-Tool.git
    cd PDF-Tool
  2. Create virtual environment:

    python -m venv pdf_tool_env
    source pdf_tool_env/bin/activate  # On Windows: pdf_tool_env\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Setup configuration:

    cp .env.example .env
    # Edit .env with your settings (MySQL connection, etc.)
  5. Initialize database:

    python
    >>> from app import app, db
    >>> with app.app_context():
    ...     db.create_all()
    >>> exit()
  6. Run the application:

    python run.py

    Visit: http://localhost:5000


📋 Table of Contents


📚 Installation Guide

Detailed Setup Instructions

See README_SETUP.md for complete step-by-step installation guide with detailed explanations.

System Requirements

Component Minimum Recommended
Python 3.10 3.11+
MySQL 5.7 8.0+
RAM 2GB 4GB+
Disk Space 500MB 2GB+
OS Windows/Mac/Linux Windows/Mac/Linux

Installing from Requirements

The project includes all necessary packages:

pip install -r requirements.txt

Main packages:

  • Flask - Web framework
  • SQLAlchemy - Database ORM
  • PyMuPDF - PDF processing
  • pypdf - PDF manipulation
  • Celery - Background tasks
  • Redis - Task queue

📁 Project Structure

PDF-Tool/
│
├── app.py                          # Main Flask application
├── api.py                          # REST API endpoints
├── config.py                       # Configuration settings
├── run.py                          # Entry point to start app
├── requirements.txt                # Python dependencies
│
├── common/                         # Shared utilities
│   ├── blob_storage.py            # Database file storage
│   ├── conversion_qa.py           # Quality assurance
│   ├── dependency_checker.py      # Check dependencies
│   ├── error_recovery.py          # Error handling
│   ├── exceptions.py              # Custom exceptions
│   ├── file_processing.py         # File operations
│   ├── file_validation.py         # Validation logic
│   ├── health_check.py            # System health
│   ├── model_validation.py        # Database validation
│   ├── progress.py                # Progress tracking
│   ├── subprocess_utils.py        # Subprocess handling
│   └── upload_handler.py          # Upload management
│
├── Feature/                        # Application features
│   ├── admin_features.py          # Admin operations
│   ├── authentication_features.py # User auth
│   ├── conversion_features.py     # Conversion logic
│   ├── file_management_features.py
│   ├── dashboard_features.py      # Dashboard UI
│   ├── error_handling.py          # Error handlers
│   └── feature_manager.py         # Feature management
│
├── pdf_modules/                    # PDF operations
│   ├── pdf_base.py                # Base PDF class
│   ├── pdf_convert.py             # Conversion operations
│   ├── pdf_merge_split.py         # Merge and split
│   ├── pdf_compress.py            # Compression
│   ├── pdf_security.py            # Password protection
│   ├── pdf_validation.py          # PDF validation
│   ├── pdf_transform.py           # Transform operations
│   ├── pdf_edit.py                # Edit operations
│   ├── pdf_repair.py              # Repair PDFs
│   └── ... (more modules)
│
├── database/                       # Database models
│   ├── models_user.py             # User model
│   ├── models_file.py             # File model
│   ├── models_tracking.py         # Tracking model
│   ├── db.py                      # Database setup
│   └── pdf_tool_project.db        # Database file
│
├── static/                         # Frontend assets
│   ├── css/                       # Stylesheets
│   ├── js/                        # JavaScript files
│   ├── images/                    # Images
│   └── index.html                 # Main HTML
│
├── templates/                      # HTML templates
│   ├── base.html                  # Base template
│   ├── index.html                 # Home page
│   ├── dashboard.html             # User dashboard
│   └── ...
│
├── uploads/                        # ⚠️ User uploaded files (not committed)
├── processed/                      # ⚠️ Output files (not committed)
├── .env.example                   # Example config (safe to commit)
├── .env                           # ⚠️ Your config (NOT committed)
├── .gitignore                     # Git ignore rules
├── README.md                      # This file
├── README_SETUP.md                # Detailed setup guide
└── LICENSE                        # MIT License

💻 Usage

Web Interface

  1. Open http://localhost:5000 in your browser
  2. Choose your PDF operation from the menu
  3. Upload your PDF file
  4. Configure settings (if needed)
  5. Click "Process" button
  6. Download the result

Command Line (Python Script)

from pdf_modules.pdf_convert import PDFConverter

# Convert PDF to Word
converter = PDFConverter()
converter.pdf_to_word(
    input_path="document.pdf",
    output_path="document.docx"
)
print("Conversion complete!")

REST API (For Developers)

Convert PDF to Word:

curl -X POST http://localhost:5000/api/pdf/convert-to-word \
  -F "file=@input.pdf" \
  -H "Content-Type: multipart/form-data"

Merge PDFs:

curl -X POST http://localhost:5000/api/pdf/merge \
  -F "files=@file1.pdf" \
  -F "files=@file2.pdf"

See API Documentation for more endpoints.


🔌 API Documentation

Available Endpoints

Conversion Endpoints

  • POST /api/pdf/convert-to-word - Convert to Word (DOCX)
  • POST /api/pdf/convert-to-excel - Convert to Excel (XLSX)
  • POST /api/pdf/convert-to-pptx - Convert to PowerPoint
  • POST /api/pdf/convert-to-html - Convert to HTML
  • POST /api/pdf/convert-to-text - Extract text

Processing Endpoints

  • POST /api/pdf/merge - Merge multiple PDFs
  • POST /api/pdf/split - Split PDF into pages
  • POST /api/pdf/compress - Compress PDF file
  • POST /api/pdf/rotate - Rotate pages
  • GET /api/pdf/extract-images - Extract images

Security Endpoints

  • POST /api/pdf/protect - Add password protection
  • POST /api/pdf/encrypt - Encrypt file
  • POST /api/pdf/watermark - Add watermark

Status Endpoints

  • GET /tasks/{task_id} - Get task status
  • GET /tasks/{task_id}/result - Get task result

Response Format

Success Response:

{
  "status": "success",
  "message": "Operation completed",
  "task_id": "conv_abc123def456",
  "result": {
    "output_file": "output.docx",
    "file_size": 1024000
  }
}

Error Response:

{
  "status": "error",
  "message": "Invalid PDF file",
  "error_code": "INVALID_PDF"
}

⚙️ Configuration

Environment Variables

Create .env file from .env.example:

cp .env.example .env

Key Settings

Flask Settings:

SECRET_KEY=your-secret-key-here
FLASK_ENV=development
FLASK_DEBUG=1

Database:

DATABASE_URL=mysql+pymysql://root:password@localhost:3307/pdf_tool_project

File Storage:

UPLOAD_FOLDER=uploads
PROCESSED_FOLDER=processed
MAX_CONTENT_LENGTH=1073741824  # 1GB

Redis (for background tasks):

REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0

Email (for notifications):

MAIL_SERVER=smtp.gmail.com
MAIL_PORT=587
MAIL_USERNAME=your-email@gmail.com
MAIL_PASSWORD=your-password

🔒 Security

Important Security Notes

⚠️ NEVER commit .env file to Git!

The .env file contains your personal credentials and secrets. Always:

DO:

  • Keep .env only on your computer
  • Use .env.example as template
  • Change default passwords
  • Use strong SECRET_KEY in production

DON'T:

  • Commit .env to Git/GitHub
  • Share .env file
  • Use weak passwords
  • Leave debug mode on in production

Password Protection

The application uses industry-standard encryption:

  • User passwords: bcrypt hashing
  • PDF passwords: AES-256 encryption
  • API tokens: Secure random generation

File Security

  • Uploaded files stored securely
  • Automatic cleanup of old files
  • Virus/malware scanning (if enabled)
  • Access control by user

🐛 Troubleshooting

Common Issues

Issue: "ModuleNotFoundError"

Error: No module named 'flask'

Solution:

# Activate virtual environment
source pdf_tool_env/bin/activate  # On Windows: pdf_tool_env\Scripts\activate

# Install requirements
pip install -r requirements.txt

Issue: "MySQL Connection Error"

Error: Can't connect to MySQL server

Solution:

  1. Check MySQL is running
  2. Verify connection in .env file
  3. Check username/password
  4. Verify port (default: 3306 or 3307)

Issue: "Port 5000 already in use"

Error: Address already in use

Solution: Change port in run.py:

if __name__ == '__main__':
    app.run(debug=True, port=5001)

Issue: "File too large"

Error: 413 Request Entity Too Large

Solution: Increase limit in .env:

MAX_CONTENT_LENGTH=2147483648  # 2GB

Issue: "PDF Conversion Failed"

Error: Conversion process failed

Solution:

  1. Check file is valid PDF
  2. Check disk space
  3. Check file permissions
  4. View logs for details

Viewing Logs

# Check Flask logs
tail -f logs/app.log

# Check Celery logs
tail -f logs/celery.log

🤝 Contributing

We welcome contributions! Here's how to help:

Before Starting

  1. Fork the repository
  2. Create a new branch: git checkout -b feature/your-feature
  3. Make your changes
  4. Test thoroughly

Submitting Changes

  1. Commit: git commit -m "Description of changes"
  2. Push: git push origin feature/your-feature
  3. Create Pull Request on GitHub

Guidelines

  • Follow PEP 8 style guide
  • Add comments to complex code
  • Test your changes
  • Don't commit .env file
  • Don't commit credentials

📝 License

This project is licensed under the MIT License. See LICENSE file for details.


📧 Support & Contact

Getting Help

  1. Check Documentation

    • Read README_SETUP.md for setup help
    • Check API documentation at /docs endpoint
    • Review code comments in source files
  2. Search Issues

    • Visit GitHub Issues page
    • Search for your problem
    • Read solutions from others
  3. Create an Issue

    • Click "New Issue"
    • Describe your problem clearly
    • Include error messages
    • Provide steps to reproduce
  4. Contact


🎯 Roadmap

Upcoming Features

  • Cloud storage integration (Google Drive, Dropbox)
  • Advanced OCR capabilities
  • AI-powered document analysis
  • Batch processing UI improvements
  • Mobile app
  • Docker support
  • CI/CD pipeline

Planned Improvements

  • Better error messages
  • Performance optimization
  • More file format support
  • Enhanced security features

📊 Project Stats

  • Language: Python 3.10+
  • Database: MySQL
  • Web Framework: Flask
  • Frontend: HTML/CSS/JavaScript
  • License: MIT
  • Status: Active Development

🎉 Acknowledgments

Thanks to all contributors and users who help improve this project!


Made with ❤️ for PDF lovers worldwide

Last Updated: October 25, 2025


Quick Links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published