Skip to content

A Python script that processes multiple PDF files in a specified directory, extracts the first page of each PDF, and converts it to an image (PNG format). The resulting images are saved in a designated output directory.

Notifications You must be signed in to change notification settings

joegakah/extractpdfimage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Batch PDF-to-Image Conversion Script

This Python script processes multiple PDF files in a specified directory, extracts the first page of each PDF, and converts it to an image (PNG format). The resulting images are saved in a designated output directory.

Features

  • Batch Processing: Automatically processes all PDF files in a given folder.
  • First Page Extraction: Extracts and converts only the first page of each PDF to an image.
  • Customizable Paths: Easily specify input and output directories.

Prerequisites

Python Packages

Ensure you have the following Python packages installed:

  • pdf2image
  • Pillow

You can install them using pip:

pip install pdf2image pillow

Poppler

The pdf2image library requires poppler to be installed on your system.

  • Ubuntu/Debian:
    sudo apt-get install poppler-utils
  • MacOS (using Homebrew):
    brew install poppler

Usage

  1. Clone or Download the Script:

    Save the script as batch_pdf_to_image.py.

  2. Run the Script:

    Execute the script using Python:

    python extract.py

    Enter the path for the pdfs and out path. The script will process each PDF in the specified directory, extract the first page, and save it as an image in the output directory.

Output

  • The images will be saved with the same name as the PDF files, with the addition of _page1.png at the end. For example, if the PDF is named example.pdf, the image will be saved as example_page1.png.
  • The script will print the path to each saved image once it has been processed.

Troubleshooting

  • Missing PDF Files: Ensure that the pdf_dir path is correctly set and contains valid PDF files.
  • Poppler Not Found: If you encounter issues related to poppler, ensure it is installed and accessible in your system's PATH.

License

This script is provided under the MIT License. Feel free to use, modify, and distribute it as needed.


Author: Joseph Gakah
Date: 15 August 2024

About

A Python script that processes multiple PDF files in a specified directory, extracts the first page of each PDF, and converts it to an image (PNG format). The resulting images are saved in a designated output directory.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages