Skip to content

wafy80/PdfMerger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

PdfMerger

Python 3.8+ License Flask

PdfMerger is a PDF manipulation tool available both as a command-line interface (inspired by pdftk) and as a web application. Written in Python, it provides advanced features for merging, splitting, encryption, and form/attachment management.

Table of Contents

Features

  • Merge PDFs - Combine multiple PDF files with custom page ranges
  • Split PDFs - Extract individual pages or burst into single-page files
  • Encryption - Password-protect PDFs with 128-bit encryption and granular permissions
  • Form Handling - Fill PDF forms using FDF data with optional flattening
  • Attachments - Attach files to PDFs or extract embedded attachments
  • Metadata - Read, export, and update PDF metadata
  • Background - Apply background watermarks to PDF pages
  • Repair - Attempt to recover corrupted PDF files

Requirements

  • Python 3.8 or higher
  • Dependencies (see requirements.txt):
    • pypdf[crypto]>=3.17.1
    • PyMuPDF>=1.23.8
    • typing-extensions>=4.9.0

Installation

  1. Clone or download the repository:

    git clone <repository-url>
    cd PdfMerger
  2. Install dependencies:

    pip install -r requirements.txt

Web Interface

PdfMerger includes a Flask-based web interface for easy PDF merging and splitting without using the command line.

Starting the Web Server

python app.py

The server will start on http://localhost:5000 (or http://0.0.0.0:5000 for network access).

Features

  • Merge PDFs - Upload multiple PDF files and combine them into a single document
  • Split PDF - Upload a single PDF and split it into individual pages (downloaded as ZIP)
  • Drag & Drop - Intuitive drag-and-drop file upload
  • Responsive Design - Bootstrap-styled interface that works on desktop and mobile

API Endpoints

Endpoint Method Description
/ GET Web interface homepage
/merge POST Merge multiple PDFs
/split POST Split a PDF into pages
/download/<filename> GET Download a processed file
/cleanup POST Clean up temporary files

Request Format

Merge:

curl -X POST -F "files=@file1.pdf" -F "files=@file2.pdf" http://localhost:5000/merge

Split:

curl -X POST -F "file=@document.pdf" http://localhost:5000/split

Configuration

Variable Default Description
MAX_CONTENT_LENGTH 50MB Maximum upload size
UPLOAD_FOLDER temp dir Temporary upload location
OUTPUT_FOLDER temp dir Output file location

Command-Line Usage

General Syntax

python pdfmerger.py <input> [<operation>] [output <file>] [options]

Main Operations

1. Merge PDFs (cat)

Combine multiple PDFs into a single file:

# Simple merge
python pdfmerger.py file1.pdf file2.pdf cat output merged.pdf

# With page ranges
python pdfmerger.py A=doc1.pdf B=doc2.pdf cat A1-3 B5-10 output result.pdf

# Pages to end
python pdfmerger.py A=doc.pdf cat A5-end output extracted.pdf

2. Split PDFs (burst)

Split a PDF into individual pages:

python pdfmerger.py document.pdf burst output page_%04d.pdf

Creates a doc_data.txt file with metadata.

3. Extract Metadata (dump_data)

python pdfmerger.py input.pdf dump_data output metadata.txt

4. Extract Form Field Information (dump_data_fields)

python pdfmerger.py form.pdf dump_data_fields output fields.txt

5. Apply Background (background)

python pdfmerger.py input.pdf background watermark.pdf output result.pdf

6. Fill Form (fill_form)

python pdfmerger.py form.pdf fill_form data.fdf output filled.pdf flatten

The flatten option flattens form fields after filling.

7. Attach Files (attach_files)

# Attach at document level
python pdfmerger.py input.pdf attach_files attachment1.txt attachment2.pdf output output.pdf

# Attach to a specific page
python pdfmerger.py input.pdf attach_files file.txt to_page 5 output output.pdf

8. Extract Attachments (unpack_files)

python pdfmerger.py input.pdf unpack_files output ./attachments_folder/

9. Update Metadata (update_info)

python pdfmerger.py input.pdf update_info info.json output output.pdf

info.json format:

{
  "/Title": "Document Title",
  "/Author": "Author Name",
  "/Subject": "Document Subject",
  "/Keywords": "keyword1, keyword2",
  "/Creator": "Creator Name",
  "/Producer": "Producer Name"
}

Encryption

Password-Protect a PDF

python pdfmerger.py input.pdf output protected.pdf encrypt_128bit owner_pw owner_password user_pw user_password

Specify Permissions

python pdfmerger.py input.pdf output protected.pdf encrypt_128bit allow Printing CopyContents

Available permissions:

  • Printing - Print the document
  • DegradedPrinting - Print at degraded quality
  • ModifyContents - Modify document contents
  • Assembly - Assemble the document
  • CopyContents - Copy contents
  • ScreenReaders - Screen reader access
  • ModifyAnnotations - Modify annotations
  • FillIn - Fill form fields
  • AllFeatures - All features enabled

Other Options

Option Description
flatten Flatten form fields
compress Compress streams (default)
uncompress Decompress streams
verbose Show verbose output
repair Attempt to repair corrupted PDFs
do_ask Enable interactive input (default)
dont_ask Disable interactive input

Advanced Examples

Merge with Encryption

python pdfmerger.py A=chapter1.pdf B=chapter2.pdf cat output book.pdf encrypt_128bit owner_pw admin user_pw reader allow Printing

Repair a Corrupted PDF

python pdfmerger.py corrupted.pdf output repaired.pdf repair

Filter Mode (No Operation)

Apply only output options to a single PDF:

python pdfmerger.py input.pdf output output.pdf flatten compress

Exit Codes

Code Description
0 Success
1 Error (missing input, wrong password, etc.)

Web Interface Dependencies

The web interface requires additional packages:

  • flask
  • flask-cors
  • werkzeug

Install them with:

pip install flask flask-cors

Known Limitations

  • The fill_form operation requires FDF files in simplified format
  • background uses PyMuPDF and may not preserve all interactive features
  • Some heavily corrupted PDFs may not be repairable

Differences from Original pdftk

  • Implemented in pure Python (with PyMuPDF support for advanced operations)
  • Some FDF operations are simplified
  • Native Unicode support in metadata
  • Cross-platform compatibility

License

Free software with no warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Version

2.0.0 - Copyright (C) 2025

Contributing

Contributions are welcome! To report bugs or request features, please open an issue.


Note: This project is not affiliated with or endorsed by the original pdftk authors.

About

πŸ”§ Powerful PDF manipulation tool with both CLI and web interface. Merge, split, encrypt, fill forms, and manage attachments. Inspired by pdftk, built with Python & Flask.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors