-
Notifications
You must be signed in to change notification settings - Fork 0
CI/CD integration: privacy scanning in GitHub Actions for PRs #6
Copy link
Copy link
Open
Description
Problem
Privacy-guard currently runs locally via claude --agent. There's no automated scanning of pull requests — a contributor could open a PR with PII and it would only be caught if someone manually runs the agent.
Vision
A GitHub Actions workflow that runs privacy-guard (or a CI-adapted variant) on every PR, scanning the PR diff for PII before merge.
Design considerations
What to scan
- The PR diff (
git diff origin/main...HEAD) — same scope as privacy-guard's pre-push mode - PR title and body text (may contain PII in descriptions)
- Commit messages in the PR branch
How to run the agent in CI
- Claude Code CLI can run in CI with
claude --agent privacy-guard -p "..." - Needs an Anthropic API key as a GitHub secret
- Agent needs PERSON.md — either checked into the repo (risky), stored as a GitHub secret, or fetched from a secure location at runtime
- Alternative: a stripped-down deterministic scanner for CI that complements the LLM agent locally
PERSON.md in CI
This is the hard problem. PERSON.md contains the user's personal patterns and must never be committed. Options:
- GitHub secret — store PERSON.md content as a secret, write to temp file at runtime
- Shared org-level patterns — a subset of patterns (employer names, org-specific terms) that apply to all contributors, stored in repo
- No PERSON.md — CI scanner only checks for built-in patterns (credentials, secrets, key formats) without personal patterns. Less coverage but zero config.
- Per-contributor PERSON.md — each contributor uploads their own via a secure mechanism. Complex.
Output
- Post scan results as a PR comment or check annotation
- Block merge if high-severity findings exist
- Allow overrides via PR labels or commit message flags for false positives
Performance
- Agent takes 30-90 seconds per scan on small repos
- Acceptable for PR checks (not blocking push, just merge)
- Could cache PERSON.md setup across runs
Open questions
- Should this be the same agent or a CI-specific variant?
- How to handle the PERSON.md problem securely?
- Should the workflow also run the deterministic scanner (git-scan/consult precommit) as a fast first pass?
- How to report findings without leaking PII into public PR comments?
Related
- Agent interface contract: parent-facing skill, schema enforcement, and discovery #4 — agent interface contract (structured inputs would help CI integration)
- Agent should own PII categories and reason from any input, not just structured YAML #2 — agent should reason from any input format
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels