Skip to content

CI/CD integration: privacy scanning in GitHub Actions for PRs #6

@krisrowe

Description

@krisrowe

Problem

Privacy-guard currently runs locally via claude --agent. There's no automated scanning of pull requests — a contributor could open a PR with PII and it would only be caught if someone manually runs the agent.

Vision

A GitHub Actions workflow that runs privacy-guard (or a CI-adapted variant) on every PR, scanning the PR diff for PII before merge.

Design considerations

What to scan

  • The PR diff (git diff origin/main...HEAD) — same scope as privacy-guard's pre-push mode
  • PR title and body text (may contain PII in descriptions)
  • Commit messages in the PR branch

How to run the agent in CI

  • Claude Code CLI can run in CI with claude --agent privacy-guard -p "..."
  • Needs an Anthropic API key as a GitHub secret
  • Agent needs PERSON.md — either checked into the repo (risky), stored as a GitHub secret, or fetched from a secure location at runtime
  • Alternative: a stripped-down deterministic scanner for CI that complements the LLM agent locally

PERSON.md in CI

This is the hard problem. PERSON.md contains the user's personal patterns and must never be committed. Options:

  • GitHub secret — store PERSON.md content as a secret, write to temp file at runtime
  • Shared org-level patterns — a subset of patterns (employer names, org-specific terms) that apply to all contributors, stored in repo
  • No PERSON.md — CI scanner only checks for built-in patterns (credentials, secrets, key formats) without personal patterns. Less coverage but zero config.
  • Per-contributor PERSON.md — each contributor uploads their own via a secure mechanism. Complex.

Output

  • Post scan results as a PR comment or check annotation
  • Block merge if high-severity findings exist
  • Allow overrides via PR labels or commit message flags for false positives

Performance

  • Agent takes 30-90 seconds per scan on small repos
  • Acceptable for PR checks (not blocking push, just merge)
  • Could cache PERSON.md setup across runs

Open questions

  • Should this be the same agent or a CI-specific variant?
  • How to handle the PERSON.md problem securely?
  • Should the workflow also run the deterministic scanner (git-scan/consult precommit) as a fast first pass?
  • How to report findings without leaking PII into public PR comments?

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions