-
Notifications
You must be signed in to change notification settings - Fork 0
Update debug-agent-tests skill: log review as verification on every run #3
Description
Problem
The debug-agent-tests skill treats debug logs as a troubleshooting tool for failures. It should treat them as a verification tool for every run.
The primary verification mechanism will be attribution metadata and scan input reporting in the agent's structured output (#2 covers adding per-finding source fields and a scan_inputs block). Tests will assert programmatically on why the agent flagged something and what inputs it used. But log review remains essential as a secondary check: structured output reports what the agent says it did, logs show what it actually did. They catch things the output can't reveal — the agent reading the wrong PERSON.md, escaping the test repo scope, the parent agent leaking context, or excessive/misdirected tool usage.
See CONTRIBUTING.md "Subagent containment principle" for the containment model: findings MUST include matched values (the parent needs them to fix the issue and is already exposed to them via the repo), but scan targets (the full universe of values checked from PERSON.md) MUST NOT appear in output — those may include values the parent has never seen.
Why passing tests need verification
The privacy-guard agent runs as a subagent. Multiple sources of information can influence its behavior beyond what the test controls:
- Parent agent context leakage: The parent agent or user session may inject context gathered from the broader environment — the real PERSON.md, prior conversation history, iterative debugging across runs. A test for "sparse config" that passes because the parent filled in the gaps is not testing the agent.
- Real config bleed-through: The agent might read
~/.config/ai-common/PERSON.mdfrom the test machine instead of the fixture. A "no PERSON.md" test could pass while the agent quietly used the real one. - Scope escape: The agent might read files outside the temp repo, call
ghagainst real remotes, access sibling directories, or discover OS environment context the test didn't plant. - Excessive or misdirected tool usage: The agent might make 50 tool calls when 5 would suffice, scan directories it shouldn't know about, or perform redundant work that catches the right value by accident.
- Right value, wrong reason: The agent might flag a planted value because it matched a substring of something else, appeared in the agent's own error output, or was found through a reasoning path unrelated to what the test is exercising.
Changes to the skill
1. Debug logging as default for single-test runs
PRIVACY_GUARD_DEBUG=1 should be the recommended default when running individual tests during development, not an optional troubleshooting flag. The skill's "How to run" section should lead with the debug variant.
2. Post-run log review checklist
Add a verification checklist that applies to every test run, pass or fail:
Scope verification:
- Agent's tool calls stayed within the temp repo directory
- No reads of
~/.config/,~/, or paths outside the temp dir - No
ghcalls to real remotes (test repos have no remote) - No access to sibling directories or other repos
Input verification:
- Agent used the test fixture PERSON.md (or correctly had none, for no-config tests)
- No evidence of context injected by a parent agent or prior session
Attribution verification (complements structured output assertions):
- The agent's reasoning path in the log is consistent with the
sourcefield it reported in the finding (e.g., it didn't read PERSON.md frontmatter but reportbuiltin_pattern) - The agent's reasoning path aligns with the test's intent (e.g., a "built-in pattern" test should show pattern recognition, not PERSON.md lookup)
- Tool call count and types are reasonable for the scenario
On fail, additionally:
- Did the agent attempt the right approach but produce wrong output, or never attempt the category at all?
- Was the agent confused by the test setup (e.g., treated a sparse PERSON.md as an error)?
3. Guidance on documenting findings from log review
When log review reveals unexpected agent behavior — even on a passing test — the skill should direct the operator to file follow-on issues. Examples:
- Agent read the real PERSON.md → isolation problem, needs test harness fix
- Agent made 40 tool calls for a single-file repo → efficiency issue
- Agent found the value but categorized it wrong → agent prompt issue
- Agent's reasoning shows it ignored a missing PERSON.md category → relates to Agent should own PII categories and reason from any input, not just structured YAML #2
Work breakdown
- Update skill: debug logging as default for single-test runs
- Update skill: add post-run verification checklist (scope, input, attribution)
- Update skill: add guidance on filing follow-on issues from log review findings
- Update skill: revise "After a failure" section to also cover passing-test verification
- Update README.md and CONTRIBUTING.md to reflect that log review is verification, not just troubleshooting