Skip to content

Agent should own PII categories and reason from any input, not just structured YAML #2

@krisrowe

Description

@krisrowe

Problem

The agent currently treats PERSON.md YAML frontmatter as its only source of scan targets. No frontmatter entry = no scanning for that category. This is wrong — the agent is an LLM and should bring its own knowledge of what constitutes PII and sensitive data.

Current behavior

  • Agent parses patterns: YAML block from PERSON.md
  • Each key becomes a scan category with explicit values to match
  • Missing category = no scanning for it
  • Agent is essentially a fancy grep wrapper with an LLM tax

Expected behavior

The agent should have built-in awareness of all PII categories and use whatever information it has to check them:

1. Structured PERSON.md (best case)

Clean YAML frontmatter with explicit values. Agent matches precisely.

2. Unstructured PERSON.md

A narrative paragraph like "my wife Sarah and I live in Denver, we use Example Bank" — agent extracts: name Sarah, city Denver, financial provider Example Bank. No YAML needed.

3. Prompt-provided context

User says "also check for references to Acme Corp" — agent adds that for this run.

4. Discovered during scanning

Agent sees ghp_abc123 — recognizes GitHub token pattern. Sees a 44-char alphanumeric string in a Google Docs URL — flags it. Sees a commit message "fix for Bob's deployment" — flags the name even if Bob isn't in PERSON.md.

5. No PERSON.md at all

Agent should still be useful:

  • Detect credential patterns (API keys, tokens, private keys)
  • Detect structural patterns (GCP project IDs, Google Doc IDs, service account emails)
  • Check OS username and home dir (runtime)
  • Apply judgment to contextual leaks
  • Flag anything that looks like PII even without a reference list

Categories the agent should always know about

Regardless of PERSON.md:

  • Names, email addresses, phone numbers, usernames
  • Home directory / workspace paths
  • Credentials (API keys, tokens, passwords, private keys)
  • Cloud/infrastructure IDs (structural pattern detection)
  • Financial data (account numbers, amounts, provider names)
  • Employer references
  • Physical locations tied to identity
  • Private repo names
  • SSNs, tax IDs, credit card numbers

Impact on PERSON.md

PERSON.md becomes a way to enhance detection, not the sole driver:

  • Specific values the agent can't guess (your actual email, family names)
  • Context that improves judgment (which city names are personal vs generic)
  • False positive guidance

Without PERSON.md, the agent is a general-purpose PII scanner. With it, a personalized one.

Impact on tests

The current test suite validates detection with a fully-populated PERSON.md. This issue requires testing across the full spectrum of input configurations — varying what the agent knows and how it learns it.

Test matrix

Scenario PERSON.md Categories Format Prompt context
Full structured config Present All populated YAML frontmatter None
Sparse structured config Present Some categories missing YAML frontmatter None
Unstructured prose Present Implied by narrative Markdown body only, no frontmatter None
Mixed format Present Some YAML, some prose Frontmatter + body None
Prompt-injected context Present (sparse or absent) Partial Any Additional targets via prompt
Prompt-only context Absent None in file N/A All context via prompt
No context at all Absent None N/A None

What each scenario should verify

  • Full structured config: Baseline — all planted PII detected. Findings should carry source: person_md_frontmatter. OS-level findings should carry source: os_runtime. Existing tests cover detection but need to be updated to assert on source.
  • Sparse config: Agent still detects categories not listed in PERSON.md (credentials, structural patterns, OS-level values). Missing categories do not mean missing detection. Configured-category findings should carry source: person_md_frontmatter. Unconfigured-category findings should carry source: builtin_pattern or source: os_runtime — never person_md_frontmatter for a category that wasn't in the frontmatter.
  • Unstructured prose: Agent extracts names, cities, providers from narrative text and uses them as scan targets. Findings should carry source: person_md_body, not person_md_frontmatter. No YAML parsing needed.
  • Mixed format: Agent combines structured frontmatter values with context extracted from the prose body. Each finding's source should reflect where that specific value came from — person_md_frontmatter for YAML values, person_md_body for prose-extracted values.
  • Prompt-injected context: Values provided in the scan prompt (e.g., "also check for references to Acme Corp") are detected even if absent from PERSON.md. These findings should carry source: prompt. Any findings from PERSON.md should carry the appropriate person_md_* source.
  • Prompt-only context: With no PERSON.md, the agent uses prompt-provided values plus its own built-in pattern knowledge. Findings should carry source: prompt or source: builtin_pattern — never any person_md_* source.
  • No context at all: Agent falls back to general-purpose PII scanning — credential patterns, structural IDs, OS username/homedir, contextual judgment. Findings should carry source: builtin_pattern, source: os_runtime, or source: contextual_judgment only. Any person_md_* or prompt source is a test failure.

Every source value (person_md_frontmatter, person_md_body, prompt, builtin_pattern, os_runtime, contextual_judgment) must be exercised by at least one test scenario. The matrix above covers all six.

Test result attribution

Tests should be able to assert on why the agent flagged something, not just what it flagged. This requires two additions to the structured JSON output:

Per-finding attribution (source field)

Each finding should indicate where the agent learned that the matched value was sensitive:

  • person_md_frontmatter — matched a value from YAML patterns: block
  • person_md_body — extracted from the prose/narrative body of PERSON.md
  • prompt — provided by the user or parent agent in the scan prompt
  • builtin_pattern — recognized from the agent's own knowledge (credential formats, structural IDs)
  • os_runtime — discovered from OS environment ($USER, $HOME)
  • contextual_judgment — flagged based on context, not a specific configured value

With this, tests can make precise assertions: a "sparse config" test can assert that a credential finding has source builtin_pattern, not person_md_frontmatter. A "prompt-injected context" test can assert source prompt.

Scan metadata (scan_inputs block)

Independent of findings, the output should report what the agent scanned for and where that knowledge came from — without revealing the actual values (see CONTRIBUTING.md "Subagent containment principle"). For example:

"scan_inputs": {
  "categories": [
    {"category": "emails", "values_count": 3, "source": "person_md_frontmatter"},
    {"category": "names", "values_count": 2, "source": "person_md_frontmatter"},
    {"category": "cities", "values_count": 1, "source": "person_md_body"},
    {"category": "credentials", "values_count": 0, "source": "builtin_pattern"},
    {"category": "os_system", "values_count": 2, "source": "os_runtime"}
  ],
  "person_md_path": "/path/used",
  "person_md_format": "yaml_frontmatter | prose_only | mixed | absent",
  "prompt_context_provided": true
}

This lets tests verify the agent parsed the right PERSON.md, used the right format parser, and activated the right categories — all without the actual sensitive values entering the parent agent's context.

Primary vs. secondary verification

The per-finding source and scan_inputs block are the primary verification mechanism — tests assert on them programmatically. Log review (#3) is the secondary check — confirming scope isolation, tool call efficiency, and catching false passes that structured output alone can't reveal (e.g., the agent read the real PERSON.md but reported builtin_pattern as the source). The goal is for tests to be as self-verifying as possible through the response schema.

False pass risks that attribution helps catch

Even with attribution metadata, these risks remain and warrant log review:

  • Parent agent context leakage — the parent agent may inject context from the broader environment into the subagent's prompt. Attribution would show prompt, but the test didn't intend that context to be present.
  • Real config bleed-through — agent reads the real PERSON.md from the test machine. Attribution might honestly say person_md_frontmatter, but it's the wrong PERSON.md.
  • Scope escape — agent reads files outside the temp repo or calls gh against real remotes.
  • Coincidental match — agent reports the right value but the attribution is wrong or misleading.

Test module organization

All integration tests for this agent currently live in a single pytest module. With the expanded test matrix, this is a good time to modularize into a package of modules under a common directory (e.g., tests/integration/privacy_guard/). Group by input configuration:

  • test_structured_config.py — full and sparse PERSON.md with YAML frontmatter
  • test_unstructured_config.py — prose-only and mixed-format PERSON.md
  • test_prompt_context.py — targets injected via prompt, with and without PERSON.md
  • test_no_config.py — no PERSON.md, general-purpose detection
  • test_scan_completeness.py — existing tests for full-config baseline

Shared fixtures (repo setup, PERSON.md variants, agent invocation) stay in conftest.py.

Documentation

Update README.md and CONTRIBUTING.md to reflect the changes — updated PERSON.md role (enhancement vs requirement), new test organization, and any new test commands or markers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions