Skip to content

brianluft/pdf2md

Repository files navigation

pdf2md

A Windows CLI application that converts PDF documents to Markdown using AI-powered transcription.

pdf2md renders each page of a PDF as an image, sends each image to OpenAI's GPT-5 vision API for transcription, and combines the results into a single Markdown file.

Requirements

  • Windows 10 or later (64-bit)
  • Internet connection (for OpenAI API calls)
  • Valid OpenAI API key with GPT-5 access

Installation

Download pdf2md.exe from the GitHub Releases page. No installation is required; the executable is self-contained.

Place the executable in a directory on your PATH, or run it directly from any location.

Usage

pdf2md [options] <input.pdf> [output.md]

Arguments

Argument Description
input.pdf Path to the PDF file to convert (required)
output.md Path for the output Markdown file (optional, defaults to <input>.md)

Options

Option Default Description
--temp-dir <path> . (current directory) Directory for temporary page images
--workers <n> 1 Number of parallel transcription workers

Examples

# Convert report.pdf to report.md
pdf2md report.pdf

# Specify output file location
pdf2md report.pdf notes/output.md

# Transcribe 4 pages in parallel for faster processing
pdf2md --workers 4 large-doc.pdf

# Use a specific directory for temporary files
pdf2md --temp-dir /tmp report.pdf

Configuration

OpenAI API Key

Set the OPENAI_API_KEY environment variable before running pdf2md:

# Windows Command Prompt
set OPENAI_API_KEY=sk-...

# Windows PowerShell
$env:OPENAI_API_KEY = "sk-..."

# Git Bash / WSL
export OPENAI_API_KEY=sk-...

The API key is not accepted as a command-line argument to avoid accidental exposure in shell history.

How It Works

  1. Rendering: Each page of the PDF is rendered to a JPEG image using PDFium at 150 DPI.
  2. Transcription: Each page image is sent to OpenAI's GPT-5 vision API with a prompt to transcribe the content to Markdown.
  3. Combination: The Markdown outputs from all pages are joined with horizontal rule separators (---) into a single file.

Temporary image files are created during processing and automatically cleaned up when the conversion completes or if an error occurs.

Parallel Processing

Use the --workers option to transcribe multiple pages simultaneously. This can significantly speed up conversion of large documents. Each worker processes one page at a time, and results are combined in the correct page order regardless of completion order.

Note: Higher worker counts increase API request concurrency. Ensure your OpenAI account rate limits can accommodate the number of workers you specify.

Error Messages

Error Meaning
File not found: <path> The input PDF file does not exist
File does not appear to be a PDF: <path> The input file does not have a .pdf extension (warning only)
Cannot write to: <path> The output path is not writable
Temp directory not found: <path> The specified temp directory does not exist
Cannot write to temp directory: <path> The temp directory is not writable
OPENAI_API_KEY environment variable not set The API key is missing
OpenAI authentication failed The API key is invalid
Rate limited Too many API requests; the application will retry automatically

License

This project is released into the public domain under the Unlicense.

About

Convert PDFs into Markdown using GPT

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published