-
Notifications
You must be signed in to change notification settings - Fork 0
Agent should own PII categories and reason from any input, not just structured YAML #2
Description
Problem
The agent currently treats PERSON.md YAML frontmatter as its only source of scan targets. No frontmatter entry = no scanning for that category. This is wrong — the agent is an LLM and should bring its own knowledge of what constitutes PII and sensitive data.
Current behavior
- Agent parses
patterns:YAML block from PERSON.md - Each key becomes a scan category with explicit values to match
- Missing category = no scanning for it
- Agent is essentially a fancy grep wrapper with an LLM tax
Expected behavior
The agent should have built-in awareness of all PII categories and use whatever information it has to check them:
1. Structured PERSON.md (best case)
Clean YAML frontmatter with explicit values. Agent matches precisely.
2. Unstructured PERSON.md
A narrative paragraph like "my wife Sarah and I live in Denver, we use Example Bank" — agent extracts: name Sarah, city Denver, financial provider Example Bank. No YAML needed.
3. Prompt-provided context
User says "also check for references to Acme Corp" — agent adds that for this run.
4. Discovered during scanning
Agent sees ghp_abc123 — recognizes GitHub token pattern. Sees a 44-char alphanumeric string in a Google Docs URL — flags it. Sees a commit message "fix for Bob's deployment" — flags the name even if Bob isn't in PERSON.md.
5. No PERSON.md at all
Agent should still be useful:
- Detect credential patterns (API keys, tokens, private keys)
- Detect structural patterns (GCP project IDs, Google Doc IDs, service account emails)
- Check OS username and home dir (runtime)
- Apply judgment to contextual leaks
- Flag anything that looks like PII even without a reference list
Categories the agent should always know about
Regardless of PERSON.md:
- Names, email addresses, phone numbers, usernames
- Home directory / workspace paths
- Credentials (API keys, tokens, passwords, private keys)
- Cloud/infrastructure IDs (structural pattern detection)
- Financial data (account numbers, amounts, provider names)
- Employer references
- Physical locations tied to identity
- Private repo names
- SSNs, tax IDs, credit card numbers
Impact on PERSON.md
PERSON.md becomes a way to enhance detection, not the sole driver:
- Specific values the agent can't guess (your actual email, family names)
- Context that improves judgment (which city names are personal vs generic)
- False positive guidance
Without PERSON.md, the agent is a general-purpose PII scanner. With it, a personalized one.
Impact on tests
The current test suite validates detection with a fully-populated PERSON.md. This issue requires testing across the full spectrum of input configurations — varying what the agent knows and how it learns it.
Test matrix
| Scenario | PERSON.md | Categories | Format | Prompt context |
|---|---|---|---|---|
| Full structured config | Present | All populated | YAML frontmatter | None |
| Sparse structured config | Present | Some categories missing | YAML frontmatter | None |
| Unstructured prose | Present | Implied by narrative | Markdown body only, no frontmatter | None |
| Mixed format | Present | Some YAML, some prose | Frontmatter + body | None |
| Prompt-injected context | Present (sparse or absent) | Partial | Any | Additional targets via prompt |
| Prompt-only context | Absent | None in file | N/A | All context via prompt |
| No context at all | Absent | None | N/A | None |
What each scenario should verify
- Full structured config: Baseline — all planted PII detected. Findings should carry
source: person_md_frontmatter. OS-level findings should carrysource: os_runtime. Existing tests cover detection but need to be updated to assert onsource. - Sparse config: Agent still detects categories not listed in PERSON.md (credentials, structural patterns, OS-level values). Missing categories do not mean missing detection. Configured-category findings should carry
source: person_md_frontmatter. Unconfigured-category findings should carrysource: builtin_patternorsource: os_runtime— neverperson_md_frontmatterfor a category that wasn't in the frontmatter. - Unstructured prose: Agent extracts names, cities, providers from narrative text and uses them as scan targets. Findings should carry
source: person_md_body, notperson_md_frontmatter. No YAML parsing needed. - Mixed format: Agent combines structured frontmatter values with context extracted from the prose body. Each finding's
sourceshould reflect where that specific value came from —person_md_frontmatterfor YAML values,person_md_bodyfor prose-extracted values. - Prompt-injected context: Values provided in the scan prompt (e.g., "also check for references to Acme Corp") are detected even if absent from PERSON.md. These findings should carry
source: prompt. Any findings from PERSON.md should carry the appropriateperson_md_*source. - Prompt-only context: With no PERSON.md, the agent uses prompt-provided values plus its own built-in pattern knowledge. Findings should carry
source: promptorsource: builtin_pattern— never anyperson_md_*source. - No context at all: Agent falls back to general-purpose PII scanning — credential patterns, structural IDs, OS username/homedir, contextual judgment. Findings should carry
source: builtin_pattern,source: os_runtime, orsource: contextual_judgmentonly. Anyperson_md_*orpromptsource is a test failure.
Every source value (person_md_frontmatter, person_md_body, prompt, builtin_pattern, os_runtime, contextual_judgment) must be exercised by at least one test scenario. The matrix above covers all six.
Test result attribution
Tests should be able to assert on why the agent flagged something, not just what it flagged. This requires two additions to the structured JSON output:
Per-finding attribution (source field)
Each finding should indicate where the agent learned that the matched value was sensitive:
person_md_frontmatter— matched a value from YAMLpatterns:blockperson_md_body— extracted from the prose/narrative body of PERSON.mdprompt— provided by the user or parent agent in the scan promptbuiltin_pattern— recognized from the agent's own knowledge (credential formats, structural IDs)os_runtime— discovered from OS environment ($USER,$HOME)contextual_judgment— flagged based on context, not a specific configured value
With this, tests can make precise assertions: a "sparse config" test can assert that a credential finding has source builtin_pattern, not person_md_frontmatter. A "prompt-injected context" test can assert source prompt.
Scan metadata (scan_inputs block)
Independent of findings, the output should report what the agent scanned for and where that knowledge came from — without revealing the actual values (see CONTRIBUTING.md "Subagent containment principle"). For example:
"scan_inputs": {
"categories": [
{"category": "emails", "values_count": 3, "source": "person_md_frontmatter"},
{"category": "names", "values_count": 2, "source": "person_md_frontmatter"},
{"category": "cities", "values_count": 1, "source": "person_md_body"},
{"category": "credentials", "values_count": 0, "source": "builtin_pattern"},
{"category": "os_system", "values_count": 2, "source": "os_runtime"}
],
"person_md_path": "/path/used",
"person_md_format": "yaml_frontmatter | prose_only | mixed | absent",
"prompt_context_provided": true
}This lets tests verify the agent parsed the right PERSON.md, used the right format parser, and activated the right categories — all without the actual sensitive values entering the parent agent's context.
Primary vs. secondary verification
The per-finding source and scan_inputs block are the primary verification mechanism — tests assert on them programmatically. Log review (#3) is the secondary check — confirming scope isolation, tool call efficiency, and catching false passes that structured output alone can't reveal (e.g., the agent read the real PERSON.md but reported builtin_pattern as the source). The goal is for tests to be as self-verifying as possible through the response schema.
False pass risks that attribution helps catch
Even with attribution metadata, these risks remain and warrant log review:
- Parent agent context leakage — the parent agent may inject context from the broader environment into the subagent's prompt. Attribution would show
prompt, but the test didn't intend that context to be present. - Real config bleed-through — agent reads the real PERSON.md from the test machine. Attribution might honestly say
person_md_frontmatter, but it's the wrong PERSON.md. - Scope escape — agent reads files outside the temp repo or calls
ghagainst real remotes. - Coincidental match — agent reports the right value but the attribution is wrong or misleading.
Test module organization
All integration tests for this agent currently live in a single pytest module. With the expanded test matrix, this is a good time to modularize into a package of modules under a common directory (e.g., tests/integration/privacy_guard/). Group by input configuration:
test_structured_config.py— full and sparse PERSON.md with YAML frontmattertest_unstructured_config.py— prose-only and mixed-format PERSON.mdtest_prompt_context.py— targets injected via prompt, with and without PERSON.mdtest_no_config.py— no PERSON.md, general-purpose detectiontest_scan_completeness.py— existing tests for full-config baseline
Shared fixtures (repo setup, PERSON.md variants, agent invocation) stay in conftest.py.
Documentation
Update README.md and CONTRIBUTING.md to reflect the changes — updated PERSON.md role (enhancement vs requirement), new test organization, and any new test commands or markers.