PdfMerger

PdfMerger is a PDF manipulation tool available both as a command-line interface (inspired by pdftk) and as a web application. Written in Python, it provides advanced features for merging, splitting, encryption, and form/attachment management.

Features

Merge PDFs - Combine multiple PDF files with custom page ranges
Split PDFs - Extract individual pages or burst into single-page files
Encryption - Password-protect PDFs with 128-bit encryption and granular permissions
Form Handling - Fill PDF forms using FDF data with optional flattening
Attachments - Attach files to PDFs or extract embedded attachments
Metadata - Read, export, and update PDF metadata
Background - Apply background watermarks to PDF pages
Repair - Attempt to recover corrupted PDF files

Requirements

Python 3.8 or higher
Dependencies (see requirements.txt):
- pypdf[crypto]>=3.17.1
- PyMuPDF>=1.23.8
- typing-extensions>=4.9.0

Installation

Clone or download the repository:
```
git clone <repository-url>
cd PdfMerger
```
Install dependencies:
```
pip install -r requirements.txt
```

Web Interface

PdfMerger includes a Flask-based web interface for easy PDF merging and splitting without using the command line.

Starting the Web Server

python app.py

The server will start on http://localhost:5000 (or http://0.0.0.0:5000 for network access).

Features

Merge PDFs - Upload multiple PDF files and combine them into a single document
Split PDF - Upload a single PDF and split it into individual pages (downloaded as ZIP)
Drag & Drop - Intuitive drag-and-drop file upload
Responsive Design - Bootstrap-styled interface that works on desktop and mobile

API Endpoints

Endpoint	Method	Description
`/`	GET	Web interface homepage
`/merge`	POST	Merge multiple PDFs
`/split`	POST	Split a PDF into pages
`/download/<filename>`	GET	Download a processed file
`/cleanup`	POST	Clean up temporary files

Request Format

Merge:

curl -X POST -F "files=@file1.pdf" -F "files=@file2.pdf" http://localhost:5000/merge

Split:

curl -X POST -F "file=@document.pdf" http://localhost:5000/split

Configuration

Variable	Default	Description
`MAX_CONTENT_LENGTH`	50MB	Maximum upload size
`UPLOAD_FOLDER`	temp dir	Temporary upload location
`OUTPUT_FOLDER`	temp dir	Output file location

Command-Line Usage

General Syntax

python pdfmerger.py <input> [<operation>] [output <file>] [options]

Main Operations

1. Merge PDFs (`cat`)

Combine multiple PDFs into a single file:

# Simple merge
python pdfmerger.py file1.pdf file2.pdf cat output merged.pdf

# With page ranges
python pdfmerger.py A=doc1.pdf B=doc2.pdf cat A1-3 B5-10 output result.pdf

# Pages to end
python pdfmerger.py A=doc.pdf cat A5-end output extracted.pdf

2. Split PDFs (`burst`)

Split a PDF into individual pages:

python pdfmerger.py document.pdf burst output page_%04d.pdf

Creates a doc_data.txt file with metadata.

3. Extract Metadata (`dump_data`)

python pdfmerger.py input.pdf dump_data output metadata.txt

4. Extract Form Field Information (`dump_data_fields`)

python pdfmerger.py form.pdf dump_data_fields output fields.txt

5. Apply Background (`background`)

python pdfmerger.py input.pdf background watermark.pdf output result.pdf

6. Fill Form (`fill_form`)

python pdfmerger.py form.pdf fill_form data.fdf output filled.pdf flatten

The flatten option flattens form fields after filling.

7. Attach Files (`attach_files`)

# Attach at document level
python pdfmerger.py input.pdf attach_files attachment1.txt attachment2.pdf output output.pdf

# Attach to a specific page
python pdfmerger.py input.pdf attach_files file.txt to_page 5 output output.pdf

8. Extract Attachments (`unpack_files`)

python pdfmerger.py input.pdf unpack_files output ./attachments_folder/

9. Update Metadata (`update_info`)

python pdfmerger.py input.pdf update_info info.json output output.pdf

info.json format:

{
  "/Title": "Document Title",
  "/Author": "Author Name",
  "/Subject": "Document Subject",
  "/Keywords": "keyword1, keyword2",
  "/Creator": "Creator Name",
  "/Producer": "Producer Name"
}

Encryption

Password-Protect a PDF

python pdfmerger.py input.pdf output protected.pdf encrypt_128bit owner_pw owner_password user_pw user_password

Specify Permissions

python pdfmerger.py input.pdf output protected.pdf encrypt_128bit allow Printing CopyContents

Available permissions:

Printing - Print the document
DegradedPrinting - Print at degraded quality
ModifyContents - Modify document contents
Assembly - Assemble the document
CopyContents - Copy contents
ScreenReaders - Screen reader access
ModifyAnnotations - Modify annotations
FillIn - Fill form fields
AllFeatures - All features enabled

Other Options

Option	Description
`flatten`	Flatten form fields
`compress`	Compress streams (default)
`uncompress`	Decompress streams
`verbose`	Show verbose output
`repair`	Attempt to repair corrupted PDFs
`do_ask`	Enable interactive input (default)
`dont_ask`	Disable interactive input

Advanced Examples

Merge with Encryption

python pdfmerger.py A=chapter1.pdf B=chapter2.pdf cat output book.pdf encrypt_128bit owner_pw admin user_pw reader allow Printing

Repair a Corrupted PDF

python pdfmerger.py corrupted.pdf output repaired.pdf repair

Filter Mode (No Operation)

Apply only output options to a single PDF:

python pdfmerger.py input.pdf output output.pdf flatten compress

Exit Codes

Code	Description
`0`	Success
`1`	Error (missing input, wrong password, etc.)

Web Interface Dependencies

The web interface requires additional packages:

flask
flask-cors
werkzeug

Install them with:

pip install flask flask-cors

Known Limitations

The fill_form operation requires FDF files in simplified format
background uses PyMuPDF and may not preserve all interactive features
Some heavily corrupted PDFs may not be repairable

Differences from Original pdftk

Implemented in pure Python (with PyMuPDF support for advanced operations)
Some FDF operations are simplified
Native Unicode support in metadata
Cross-platform compatibility

License

Free software with no warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Version

Contributing

Contributions are welcome! To report bugs or request features, please open an issue.

Note: This project is not affiliated with or endorsed by the original pdftk authors.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
templates		templates
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
pdfmerger.py		pdfmerger.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PdfMerger

Table of Contents

Features

Requirements

Installation

Web Interface

Starting the Web Server

Features

API Endpoints

Request Format

Configuration

Command-Line Usage

General Syntax

Main Operations

1. Merge PDFs (cat)

2. Split PDFs (burst)

3. Extract Metadata (dump_data)

4. Extract Form Field Information (dump_data_fields)

5. Apply Background (background)

6. Fill Form (fill_form)

7. Attach Files (attach_files)

8. Extract Attachments (unpack_files)

9. Update Metadata (update_info)

Encryption

Password-Protect a PDF

Specify Permissions

Other Options

Advanced Examples

Merge with Encryption

Repair a Corrupted PDF

Filter Mode (No Operation)

Exit Codes

Web Interface Dependencies

Known Limitations

Differences from Original pdftk

License

Version

Contributing

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

1. Merge PDFs (`cat`)

2. Split PDFs (`burst`)

3. Extract Metadata (`dump_data`)

4. Extract Form Field Information (`dump_data_fields`)

5. Apply Background (`background`)

6. Fill Form (`fill_form`)

7. Attach Files (`attach_files`)

8. Extract Attachments (`unpack_files`)

9. Update Metadata (`update_info`)