diff --git a/.github/workflows/validate.yml b/.github/workflows/validate.yml new file mode 100644 index 0000000..cb621f8 --- /dev/null +++ b/.github/workflows/validate.yml @@ -0,0 +1,52 @@ +name: Validate + +on: + pull_request: + push: + branches: + - main + - "codex/**" + +jobs: + validate: + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Install YAML parser + run: python -m pip install --disable-pip-version-check pyyaml + + - name: Validate repo structure + run: python tools/validate-repo.py + + - name: Validate Codex metadata + run: | + python - <<'PY' + import pathlib + import yaml + + path = pathlib.Path("nutrient-document-processing/agents/openai.yaml") + data = yaml.safe_load(path.read_text()) + interface = data.get("interface", {}) + required = ["display_name", "short_description", "default_prompt"] + missing = [key for key in required if key not in interface] + if missing: + raise SystemExit(f"openai.yaml missing interface keys: {missing}") + print("openai.yaml parsed successfully") + PY + + - name: Compile Python scripts + run: | + python -m py_compile nutrient-document-processing/scripts/*.py nutrient-document-processing/scripts/lib/common.py + + - name: Smoke test script help + run: | + for script in nutrient-document-processing/scripts/*.py; do + python "$script" --help > /dev/null + done diff --git a/README.md b/README.md index de4915e..ebcd555 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,14 @@

Nutrient DWS API - License + npm version + License Agent Skills

Give your AI agent PDF superpowers β€” in one command.
- Convert, extract, OCR, redact, sign, and fill documents from any coding agent. + Generate, convert, extract, OCR, redact, sign, archive, and optimize documents from any coding agent.

@@ -131,13 +132,17 @@ patient-records.pdf (contains PII) | Capability | Description | Example prompt | |------------|-------------|----------------| +| ✨ **Generate** | Create PDFs from HTML templates, uploaded assets, or remote URLs | *"Generate a PDF proposal from this HTML template"* | | πŸ“„ **Convert** | PDF ↔ DOCX/XLSX/PPTX, HTML β†’ PDF, images β†’ PDF | *"Convert report.docx to PDF"* | +| 🧩 **Assemble** | Merge, split, reorder, rotate, and flatten PDF packets before delivery | *"Merge these PDFs, rotate the landscape pages, and keep only pages 1-5"* | | πŸ“ **Extract** | Text, tables, and key-value pairs from PDFs | *"Extract all tables from invoice.pdf as Excel"* | | πŸ” **OCR** | Multi-language OCR for scanned documents | *"OCR this German scan and extract the text"* | | πŸ”’ **Redact** | Pattern-based + AI-powered PII redaction | *"Redact all SSNs and emails from records.pdf"* | | πŸ’§ **Watermark** | Text or image watermarks with full styling | *"Add a DRAFT watermark to proposal.pdf"* | | ✍️ **Sign** | CMS and CAdES digital signatures | *"Digitally sign contract.pdf"* | | πŸ“‹ **Fill Forms** | Programmatic PDF form filling | *"Fill the tax form with these values…"* | +| πŸ—‚οΈ **Compliance** | Convert PDFs for archival or accessibility targets like PDF/A and PDF/UA | *"Convert this PDF to PDF/A-2a"* | +| ⚑ **Optimize** | Optimize and linearize PDFs for web delivery and download performance | *"Linearize this PDF for fast web viewing"* | | πŸ“Š **Credits** | Monitor API usage and balance | *"How many API credits do I have left?"* | --- @@ -188,15 +193,21 @@ cp -r nutrient-agent-skill/nutrient-document-processing ~/.claude/skills/ ``` nutrient-document-processing/ β”œβ”€β”€ SKILL.md # Main instructions (loaded by agents) +β”œβ”€β”€ agents/ +β”‚ └── openai.yaml # Optional Codex App metadata +β”œβ”€β”€ references/ +β”‚ β”œβ”€β”€ REFERENCE.md # Reference index +β”‚ └── *.md # Focused cookbooks by workflow type β”œβ”€β”€ scripts/ β”‚ β”œβ”€β”€ *.py # Single-operation scripts β”‚ └── lib/common.py # Shared utilities β”œβ”€β”€ assets/ +β”‚ β”œβ”€β”€ nutrient.svg # Skill icon β”‚ └── templates/ β”‚ └── custom-workflow-template.py # Runtime pipeline template β”œβ”€β”€ tests/ β”‚ └── testing-guide.md -└── LICENSE # Apache-2.0 +└── LICENSE.txt # Apache-2.0 ``` ### Script Model @@ -204,12 +215,15 @@ nutrient-document-processing/ - `scripts/*.py` are single-operation scripts only. - Multi-step workflows are generated at runtime in a temporary script from `assets/templates/custom-workflow-template.py`. - Do not commit runtime pipeline scripts. +- Use `references/` for HTML/URL generation, compliance outputs, and other workflows that are easier to express as direct API payloads or temporary pipelines. ## Documentation - **[SKILL.md](nutrient-document-processing/SKILL.md)** β€” Agent instructions with setup and operation examples +- **[Reference Index](nutrient-document-processing/references/REFERENCE.md)** β€” Modular cookbook for generation, conversion, extraction, security, compliance, and workflow sequencing - **[Testing Guide](nutrient-document-processing/tests/testing-guide.md)** β€” Manual test procedures - **[Custom Workflow Template](nutrient-document-processing/assets/templates/custom-workflow-template.py)** β€” Runtime pipeline starting point +- **[Codex App Metadata](nutrient-document-processing/agents/openai.yaml)** β€” Optional manifest for Codex App packaging - **[API Playground](https://dashboard.nutrient.io/processor-api/playground/)** β€” Interactive API testing - **[Official API Docs](https://www.nutrient.io/guides/dws-processor/)** β€” Nutrient documentation @@ -219,4 +233,4 @@ Built by [Nutrient](https://www.nutrient.io/) (formerly PSPDFKit) β€” document S ## License -[Apache-2.0](nutrient-document-processing/LICENSE) +[Apache-2.0](nutrient-document-processing/LICENSE.txt) diff --git a/nutrient-document-processing/LICENSE b/nutrient-document-processing/LICENSE.txt similarity index 100% rename from nutrient-document-processing/LICENSE rename to nutrient-document-processing/LICENSE.txt diff --git a/nutrient-document-processing/SKILL.md b/nutrient-document-processing/SKILL.md index 939b6c3..8bcd33f 100644 --- a/nutrient-document-processing/SKILL.md +++ b/nutrient-document-processing/SKILL.md @@ -1,83 +1,117 @@ --- name: nutrient-document-processing description: >- - Process documents with the Nutrient DWS API. Use this skill when the user wants to convert documents - (PDF, DOCX, XLSX, PPTX, HTML, images), extract text or tables from PDFs, OCR scanned documents, - redact sensitive information (PII, SSN, emails, credit cards), add watermarks, digitally sign PDFs, - fill PDF forms, or check API credit usage. Activates on keywords: PDF, document, convert, extract, - OCR, redact, watermark, sign, merge, compress, form fill, document processing. + Process documents with Nutrient DWS. Use when the user wants to generate PDFs from HTML or URLs, + convert Office/images/PDFs, assemble or split packets, OCR scans, extract text/tables/key-value + pairs, redact PII, watermark, sign, fill forms, optimize PDFs, or produce compliance outputs like + PDF/A or PDF/UA. Triggers include convert to PDF, merge these PDFs, OCR this scan, extract tables, + redact PII, sign this PDF, make this PDF/A, or linearize for web delivery. license: Apache-2.0 metadata: author: nutrient-sdk version: "1.0" homepage: "https://www.nutrient.io/api/" repository: "https://github.com/PSPDFKit-labs/nutrient-agent-skill" - compatibility: "Requires Node.js 18+ and internet. Works with Claude Code, Codex CLI, Gemini CLI, OpenCode, Cursor, Windsurf, GitHub Copilot, Amp, or any Agent Skills-compatible product." + compatibility: "Requires Python 3.10+, uv, and internet. Works with Claude Code, Codex CLI, Gemini CLI, OpenCode, Cursor, Windsurf, GitHub Copilot, Amp, or any Agent Skills-compatible product." + short-description: "Generate, convert, assemble, OCR, redact, sign, archive, and optimize documents" --- # Nutrient Document Processing -Process, convert, extract, redact, sign, and manipulate documents using the [Nutrient DWS Processor API](https://www.nutrient.io/api/). +Use Nutrient DWS for managed document workflows where fidelity, compliance, or multi-step processing matters more than local-tool convenience. ## Setup - -You need a Nutrient DWS API key. Get one free at . - -Export the API key before running scripts: - -```bash -export NUTRIENT_API_KEY="nutr_sk_..." -``` - -Scripts live in `scripts/` relative to this SKILL.md. Use the directory containing this SKILL.md as the working directory when running scripts: - -```bash -cd && uv run scripts/