Skip to content

Implement pdftools: PDF processing and manipulation #17

@vredchenko

Description

@vredchenko

Summary

The `pdftools` plugin provides PDF processing capabilities through system tools (poppler-utils, qpdf, ghostscript). Unlike most plugins which are just command docs, this plugin may include an executable script (like `webcam-automation/webcam.ts`) that wraps multiple CLI tools into a unified interface. This is valuable because PDF manipulation requires knowing which tool does what — `pdftotext` for extraction, `qpdf` for splitting/merging, `gs` for compression — and the plugin abstracts that.

Original Intent

Plugin to parse, read and manipulate PDF files.

Commands

`/pdftools:extract`

Purpose: Extract text, images, or metadata from a PDF.

Behavior:

  1. Ask user for:

    • PDF file path
    • Extraction type: text, images, metadata, all
    • Page range (optional): "all", "1-5", "3"
  2. Check prerequisites:

    • `pdftotext` (from poppler-utils) — for text extraction
    • `pdfimages` (from poppler-utils) — for image extraction
    • `pdfinfo` (from poppler-utils) — for metadata
  3. Extract based on type:

    Text:
    ```bash
    pdftotext -layout input.pdf - # To stdout
    pdftotext -f 1 -l 5 input.pdf out.txt # Pages 1-5 to file
    ```

    • Options: `-layout` (preserve layout) vs `-raw` (reading order)
    • For tables: `pdftotext -layout -fixed 3` helps preserve columns

    Images:
    ```bash
    pdfimages -all input.pdf output-prefix

    Produces: output-prefix-000.png, output-prefix-001.jpg, etc.

    ```

    • Report: number of images extracted, formats, dimensions

    Metadata:
    ```bash
    pdfinfo input.pdf
    ```

    • Output: title, author, subject, keywords, page count, page size, PDF version, encrypted (yes/no)
  4. Present results:

    • Text → display first 100 lines, offer to save full output
    • Images → list extracted files with dimensions
    • Metadata → formatted table
  5. Output saved to `-extracted/` directory

Edge cases:

  • Scanned PDF (images, no text layer) → detect and suggest OCR: `ocrmypdf input.pdf output.pdf`
  • Password-protected PDF → ask for password, use `qpdf --password= --decrypt`
  • Very large PDF → warn about time, suggest page range

`/pdftools:analyze`

Purpose: Analyze a PDF's structure, quality, and content summary.

Behavior:

  1. Read the PDF metadata via `pdfinfo`

  2. Extract text and provide:

    • Page count and total word count
    • Content summary (first ~500 words analyzed by Claude)
    • Structure analysis: headings, sections, tables detected
    • Language detection (from text sample)
  3. Check PDF quality:

    • File size vs page count (flag if unusually large)
    • Image resolution (via `pdfimages -list`)
    • Font embedding (`pdffonts input.pdf`)
    • PDF/A compliance check if `qpdf` available
  4. Output report:
    ```
    PDF Analysis: document.pdf

    Metadata:
    Title: Q4 2024 Report
    Author: Finance Team
    Pages: 42
    Size: 8.3 MB
    Created: 2024-12-15

    Content:
    Words: ~12,400
    Language: English
    Sections: 8 (detected from headings)
    Tables: 3 (detected from layout)
    Images: 15 (avg 300 DPI)

    Quality:
    ✓ All fonts embedded
    ✓ PDF version 1.7
    ⚠ Large file size (8.3 MB for 42 pages — consider compression)
    ⚠ 3 images at 72 DPI (may appear blurry in print)
    ```

Edge cases:

  • Corrupt PDF → detect via `qpdf --check input.pdf` and report
  • Multi-language PDF → report detected languages

`/pdftools:merge` (new)

Purpose: Merge multiple PDF files into one.

Behavior:

  1. Ask user for:
    • Input files (list of PDF paths, or glob pattern like `*.pdf`)
    • Output filename
    • Order: alphabetical, as-specified, or let user reorder
  2. Merge using `qpdf`:
    ```bash
    qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- output.pdf
    ```
  3. Support page ranges:
    ```bash
    qpdf --empty --pages file1.pdf 1-3 file2.pdf 5-10 -- output.pdf
    ```
  4. Output: merged file path + page count + file size

Edge cases:

  • Different page sizes → warn but proceed (qpdf handles this)
  • Encrypted PDFs in the mix → decrypt first

`/pdftools:split` (new)

Purpose: Split a PDF into multiple files.

Behavior:

  1. Ask user for:

    • Input PDF
    • Split mode: by page range, every N pages, into individual pages, by bookmarks
  2. Split using `qpdf`:

    By range:
    ```bash
    qpdf input.pdf --pages input.pdf 1-5 -- part1.pdf
    qpdf input.pdf --pages input.pdf 6-10 -- part2.pdf
    ```

    Every N pages:
    ```bash
    qpdf input.pdf --split-pages=5 # Every 5 pages
    ```

    Individual pages:
    ```bash
    qpdf input.pdf --split-pages
    ```

  3. Output to `-split/` directory

  4. Report: number of files created + page counts

`/pdftools:compress` (new)

Purpose: Reduce PDF file size.

Behavior:

  1. Analyze current file size and content:

    • Check image resolutions and count
    • Check for unnecessary metadata/annotations
  2. Compress using ghostscript:
    ```bash
    gs -sDEVICE=pdfwrite \
    -dCompatibilityLevel=1.5 \
    -dPDFSETTINGS=/ebook \
    -dNOPAUSE -dQUIET -dBATCH \
    -sOutputFile=output.pdf input.pdf
    ```

  3. Quality presets:

    Preset DPI Use Case
    `/screen` 72 Screen viewing, smallest size
    `/ebook` 150 General purpose (default)
    `/printer` 300 High quality print
    `/prepress` 300+ Print production
  4. Ask user to choose preset or default to `/ebook`

  5. Report:
    ```
    Compression Results:
    Original: 8.3 MB
    Compressed: 2.1 MB (75% reduction)
    Quality: /ebook (150 DPI)
    ```

Edge cases:

  • File already small → report that compression won't help much
  • Lossless needed → use `qpdf --linearize` instead (removes redundancy, no quality loss)

Executable Script (optional)

Consider creating a `pdf.ts` script (similar to `webcam.ts`) that wraps all operations:

```bash
./pdf.ts extract input.pdf --type text --pages 1-5
./pdf.ts analyze input.pdf
./pdf.ts merge *.pdf -o combined.pdf
./pdf.ts split input.pdf --every 5
./pdf.ts compress input.pdf --quality ebook
```

This would provide:

  • Prerequisite checking with helpful install instructions
  • Consistent output formatting
  • Progress reporting for large files
  • Error handling with suggestions

Decision: Include the executable script if the implementation effort is reasonable (estimate ~200-300 lines). Otherwise, the command docs alone are sufficient since they direct Claude to use the right CLI tools.

Hooks

None — this plugin operates through commands only.

File Manifest

File Est. Lines Purpose
`commands/extract.md` 80-100 Extract text/images/metadata
`commands/analyze.md` 70-90 Analyze PDF structure and quality
`commands/merge.md` 60-80 Merge multiple PDFs
`commands/split.md` 60-80 Split PDF into parts
`commands/compress.md` 60-80 Compress PDF file size
`pdf.ts` (optional) 200-300 Unified CLI wrapper
`README.md` 150-180 Full plugin documentation
`.claude-plugin/plugin.json` 15-20 Plugin manifest

README Outline

  1. Overview — PDF processing via system tools

  2. Quick Start — Installation + prerequisites + first extraction

  3. Prerequisites
    ```bash
    sudo apt install poppler-utils qpdf ghostscript

    Optional: OCR support

    sudo apt install ocrmypdf
    ```

  4. Commands — Table with all 5 commands

  5. Tool Reference

    Tool Package Used For
    `pdftotext` poppler-utils Text extraction
    `pdfimages` poppler-utils Image extraction
    `pdfinfo` poppler-utils Metadata
    `pdffonts` poppler-utils Font analysis
    `qpdf` qpdf Merge, split, decrypt, linearize
    `gs` ghostscript Compression, format conversion
    `ocrmypdf` ocrmypdf OCR for scanned PDFs
  6. Compression Quality Guide — When to use each preset

  7. Examples — Common workflows (extract table data, merge reports, compress for email)

Prerequisites

```bash
sudo apt install poppler-utils qpdf ghostscript

Optional for OCR:

sudo apt install ocrmypdf
```

Quality Checklist

  • Each command .md is 60+ lines with concrete steps
  • README is 100+ lines with examples and reference tables
  • Tool reference table maps operations to specific CLI tools
  • Compression presets are documented with use cases
  • Plugin provides clear value (unified interface over 3+ CLI tools)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions