Implement pdftools: PDF processing and manipulation

## Summary

The \`pdftools\` plugin provides PDF processing capabilities through system tools (poppler-utils, qpdf, ghostscript). Unlike most plugins which are just command docs, this plugin may include an executable script (like \`webcam-automation/webcam.ts\`) that wraps multiple CLI tools into a unified interface. This is valuable because PDF manipulation requires knowing which tool does what — \`pdftotext\` for extraction, \`qpdf\` for splitting/merging, \`gs\` for compression — and the plugin abstracts that.

## Original Intent

> Plugin to parse, read and manipulate PDF files.

## Commands

### \`/pdftools:extract\`

**Purpose:** Extract text, images, or metadata from a PDF.

**Behavior:**
1. Ask user for:
   - PDF file path
   - Extraction type: text, images, metadata, all
   - Page range (optional): "all", "1-5", "3"
2. Check prerequisites:
   - \`pdftotext\` (from poppler-utils) — for text extraction
   - \`pdfimages\` (from poppler-utils) — for image extraction
   - \`pdfinfo\` (from poppler-utils) — for metadata
3. Extract based on type:
   
   **Text:**
   \`\`\`bash
   pdftotext -layout input.pdf -        # To stdout
   pdftotext -f 1 -l 5 input.pdf out.txt  # Pages 1-5 to file
   \`\`\`
   - Options: \`-layout\` (preserve layout) vs \`-raw\` (reading order)
   - For tables: \`pdftotext -layout -fixed 3\` helps preserve columns
   
   **Images:**
   \`\`\`bash
   pdfimages -all input.pdf output-prefix
   # Produces: output-prefix-000.png, output-prefix-001.jpg, etc.
   \`\`\`
   - Report: number of images extracted, formats, dimensions
   
   **Metadata:**
   \`\`\`bash
   pdfinfo input.pdf
   \`\`\`
   - Output: title, author, subject, keywords, page count, page size, PDF version, encrypted (yes/no)

4. Present results:
   - Text → display first 100 lines, offer to save full output
   - Images → list extracted files with dimensions
   - Metadata → formatted table
5. Output saved to \`<name>-extracted/\` directory

**Edge cases:**
- Scanned PDF (images, no text layer) → detect and suggest OCR: \`ocrmypdf input.pdf output.pdf\`
- Password-protected PDF → ask for password, use \`qpdf --password=<pw> --decrypt\`
- Very large PDF → warn about time, suggest page range

### \`/pdftools:analyze\`

**Purpose:** Analyze a PDF's structure, quality, and content summary.

**Behavior:**
1. Read the PDF metadata via \`pdfinfo\`
2. Extract text and provide:
   - Page count and total word count
   - Content summary (first ~500 words analyzed by Claude)
   - Structure analysis: headings, sections, tables detected
   - Language detection (from text sample)
3. Check PDF quality:
   - File size vs page count (flag if unusually large)
   - Image resolution (via \`pdfimages -list\`)
   - Font embedding (\`pdffonts input.pdf\`)
   - PDF/A compliance check if \`qpdf\` available
4. Output report:
   \`\`\`
   PDF Analysis: document.pdf
   
   Metadata:
     Title:    Q4 2024 Report
     Author:   Finance Team
     Pages:    42
     Size:     8.3 MB
     Created:  2024-12-15
   
   Content:
     Words:    ~12,400
     Language: English
     Sections: 8 (detected from headings)
     Tables:   3 (detected from layout)
     Images:   15 (avg 300 DPI)
   
   Quality:
     ✓ All fonts embedded
     ✓ PDF version 1.7
     ⚠ Large file size (8.3 MB for 42 pages — consider compression)
     ⚠ 3 images at 72 DPI (may appear blurry in print)
   \`\`\`

**Edge cases:**
- Corrupt PDF → detect via \`qpdf --check input.pdf\` and report
- Multi-language PDF → report detected languages

### \`/pdftools:merge\` (new)

**Purpose:** Merge multiple PDF files into one.

**Behavior:**
1. Ask user for:
   - Input files (list of PDF paths, or glob pattern like \`*.pdf\`)
   - Output filename
   - Order: alphabetical, as-specified, or let user reorder
2. Merge using \`qpdf\`:
   \`\`\`bash
   qpdf --empty --pages file1.pdf file2.pdf file3.pdf -- output.pdf
   \`\`\`
3. Support page ranges:
   \`\`\`bash
   qpdf --empty --pages file1.pdf 1-3 file2.pdf 5-10 -- output.pdf
   \`\`\`
4. Output: merged file path + page count + file size

**Edge cases:**
- Different page sizes → warn but proceed (qpdf handles this)
- Encrypted PDFs in the mix → decrypt first

### \`/pdftools:split\` (new)

**Purpose:** Split a PDF into multiple files.

**Behavior:**
1. Ask user for:
   - Input PDF
   - Split mode: by page range, every N pages, into individual pages, by bookmarks
2. Split using \`qpdf\`:
   
   **By range:**
   \`\`\`bash
   qpdf input.pdf --pages input.pdf 1-5 -- part1.pdf
   qpdf input.pdf --pages input.pdf 6-10 -- part2.pdf
   \`\`\`
   
   **Every N pages:**
   \`\`\`bash
   qpdf input.pdf --split-pages=5  # Every 5 pages
   \`\`\`
   
   **Individual pages:**
   \`\`\`bash
   qpdf input.pdf --split-pages
   \`\`\`

3. Output to \`<name>-split/\` directory
4. Report: number of files created + page counts

### \`/pdftools:compress\` (new)

**Purpose:** Reduce PDF file size.

**Behavior:**
1. Analyze current file size and content:
   - Check image resolutions and count
   - Check for unnecessary metadata/annotations
2. Compress using ghostscript:
   \`\`\`bash
   gs -sDEVICE=pdfwrite \\
      -dCompatibilityLevel=1.5 \\
      -dPDFSETTINGS=/ebook \\
      -dNOPAUSE -dQUIET -dBATCH \\
      -sOutputFile=output.pdf input.pdf
   \`\`\`
3. Quality presets:

   | Preset | DPI | Use Case |
   |--------|-----|----------|
   | \`/screen\` | 72 | Screen viewing, smallest size |
   | \`/ebook\` | 150 | General purpose (default) |
   | \`/printer\` | 300 | High quality print |
   | \`/prepress\` | 300+ | Print production |

4. Ask user to choose preset or default to \`/ebook\`
5. Report:
   \`\`\`
   Compression Results:
     Original:   8.3 MB
     Compressed: 2.1 MB (75% reduction)
     Quality:    /ebook (150 DPI)
   \`\`\`

**Edge cases:**
- File already small → report that compression won't help much
- Lossless needed → use \`qpdf --linearize\` instead (removes redundancy, no quality loss)

## Executable Script (optional)

Consider creating a \`pdf.ts\` script (similar to \`webcam.ts\`) that wraps all operations:

\`\`\`bash
./pdf.ts extract input.pdf --type text --pages 1-5
./pdf.ts analyze input.pdf
./pdf.ts merge *.pdf -o combined.pdf
./pdf.ts split input.pdf --every 5
./pdf.ts compress input.pdf --quality ebook
\`\`\`

This would provide:
- Prerequisite checking with helpful install instructions
- Consistent output formatting
- Progress reporting for large files
- Error handling with suggestions

**Decision:** Include the executable script if the implementation effort is reasonable (estimate ~200-300 lines). Otherwise, the command docs alone are sufficient since they direct Claude to use the right CLI tools.

## Hooks

None — this plugin operates through commands only.

## File Manifest

| File | Est. Lines | Purpose |
|------|-----------|---------|
| \`commands/extract.md\` | 80-100 | Extract text/images/metadata |
| \`commands/analyze.md\` | 70-90 | Analyze PDF structure and quality |
| \`commands/merge.md\` | 60-80 | Merge multiple PDFs |
| \`commands/split.md\` | 60-80 | Split PDF into parts |
| \`commands/compress.md\` | 60-80 | Compress PDF file size |
| \`pdf.ts\` (optional) | 200-300 | Unified CLI wrapper |
| \`README.md\` | 150-180 | Full plugin documentation |
| \`.claude-plugin/plugin.json\` | 15-20 | Plugin manifest |

## README Outline

1. **Overview** — PDF processing via system tools
2. **Quick Start** — Installation + prerequisites + first extraction
3. **Prerequisites**
   \`\`\`bash
   sudo apt install poppler-utils qpdf ghostscript
   # Optional: OCR support
   sudo apt install ocrmypdf
   \`\`\`
4. **Commands** — Table with all 5 commands
5. **Tool Reference**

   | Tool | Package | Used For |
   |------|---------|----------|
   | \`pdftotext\` | poppler-utils | Text extraction |
   | \`pdfimages\` | poppler-utils | Image extraction |
   | \`pdfinfo\` | poppler-utils | Metadata |
   | \`pdffonts\` | poppler-utils | Font analysis |
   | \`qpdf\` | qpdf | Merge, split, decrypt, linearize |
   | \`gs\` | ghostscript | Compression, format conversion |
   | \`ocrmypdf\` | ocrmypdf | OCR for scanned PDFs |

6. **Compression Quality Guide** — When to use each preset
7. **Examples** — Common workflows (extract table data, merge reports, compress for email)

## Prerequisites

\`\`\`bash
sudo apt install poppler-utils qpdf ghostscript
# Optional for OCR:
sudo apt install ocrmypdf
\`\`\`

## Quality Checklist

- [ ] Each command .md is 60+ lines with concrete steps
- [ ] README is 100+ lines with examples and reference tables
- [ ] Tool reference table maps operations to specific CLI tools
- [ ] Compression presets are documented with use cases
- [ ] Plugin provides clear value (unified interface over 3+ CLI tools)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement pdftools: PDF processing and manipulation #17

Summary

Original Intent

Commands

`/pdftools:extract`

Produces: output-prefix-000.png, output-prefix-001.jpg, etc.

`/pdftools:analyze`

`/pdftools:merge` (new)

`/pdftools:split` (new)

`/pdftools:compress` (new)

Executable Script (optional)

Hooks

File Manifest

README Outline

Optional: OCR support

Prerequisites

Optional for OCR:

Quality Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Preset	DPI	Use Case
`/screen`	72	Screen viewing, smallest size
`/ebook`	150	General purpose (default)
`/printer`	300	High quality print
`/prepress`	300+	Print production

File	Est. Lines	Purpose
`commands/extract.md`	80-100	Extract text/images/metadata
`commands/analyze.md`	70-90	Analyze PDF structure and quality
`commands/merge.md`	60-80	Merge multiple PDFs
`commands/split.md`	60-80	Split PDF into parts
`commands/compress.md`	60-80	Compress PDF file size
`pdf.ts` (optional)	200-300	Unified CLI wrapper
`README.md`	150-180	Full plugin documentation
`.claude-plugin/plugin.json`	15-20	Plugin manifest

Tool	Package	Used For
`pdftotext`	poppler-utils	Text extraction
`pdfimages`	poppler-utils	Image extraction
`pdfinfo`	poppler-utils	Metadata
`pdffonts`	poppler-utils	Font analysis
`qpdf`	qpdf	Merge, split, decrypt, linearize
`gs`	ghostscript	Compression, format conversion
`ocrmypdf`	ocrmypdf	OCR for scanned PDFs

Implement pdftools: PDF processing and manipulation #17

Description

Summary

Original Intent

Commands

`/pdftools:extract`

Produces: output-prefix-000.png, output-prefix-001.jpg, etc.

`/pdftools:analyze`

`/pdftools:merge` (new)

`/pdftools:split` (new)

`/pdftools:compress` (new)

Executable Script (optional)

Hooks

File Manifest

README Outline

Optional: OCR support

Prerequisites

Optional for OCR:

Quality Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions