Schema redesign: SARIF-aligned file findings, ruleId foreign keys, local-only scope

## Problem

The current output schema uses flat `category` + `source` fields on findings, separate `configured_categories` / `unconfigured_categories` / `os_discovered` top-level fields, and no schema versioning. The schema needs to formalize the rules-to-findings relationship, support attribution-based test assertions, and align with SARIF where the data naturally fits.

## Design

The full proposed schema is documented in [`docs/agents/privacy-guard/SCHEMA-PROPOSAL.md`](https://github.com/echomodel/claude-coding-plugin/blob/main/docs/agents/privacy-guard/SCHEMA-PROPOSAL.md). Key changes:

### SARIF-aligned file findings

File-based findings (working tree, staged, HEAD, historical commit diffs, gitignored files) all have a file path and line number. These are natively SARIF-compliant. The output uses SARIF's `physicalLocation` model for these findings. Historical commit diffs include a `commit` metadata field — the file + line is the SARIF part, the commit context is additional metadata.

### Native format for non-file findings

Commit messages, branch names, tag names, and stash descriptions are not files. SARIF's `logicalLocation` with custom `kind` values can technically represent them, but standard SARIF viewers won't render them. These findings use our native flat format with direct `location`, `location_type`, `matched_value` fields — no nesting, no `properties` bags.

### `ruleId` foreign keys

Findings reference rules via a string `ruleId` (format: `category:source`, e.g., `"emails:person_md_frontmatter"`). Rules are defined once in `tool.rules[]` with `id`, `category`, `source`, and `count`. A category can have rules from multiple sources.

### Schema versioning

`version` field at the top level so consumers know which shape to expect.

### Scope narrowing to local-only

Remove GitHub issues and PRs from the default scan scope. The agent's core question is "can I commit and push safely right now?" — that's local. Remote content scanning is a separate concern. This also simplifies the schema by removing `issue` and `pr` location types.

## Work breakdown

- [ ] Add `version` field to agent output
- [ ] Replace `configured_categories` / `unconfigured_categories` / `os_discovered` with `tool.rules[]` array
- [ ] Replace `category` + `source` on findings with `ruleId` string referencing `tool.rules[]`
- [ ] Remove issue/PR scanning from agent definition and scan scope
- [ ] Update `scan_scope` to remove `issues_checked` / `prs_checked`
- [ ] Update existing tests for new schema shape
- [ ] Add tests asserting on `ruleId` and `tool.rules[]` structure
- [ ] Update SCHEMA.md to reflect the new schema (move current to SCHEMA.md, archive or remove SCHEMA-PROPOSAL.md gap analysis)
- [ ] Update README.md and CONTRIBUTING.md as needed

## Open questions

- Should `category` remain as a convenience field on findings alongside `ruleId`? Avoids parsing the `ruleId` string for simple filtering.
- Rule lifecycle across runs: if PERSON.md changes between scans, rules change. Multi-repo scanning (issue #1 item 3) will need run-scoping. Not blocking for initial implementation.
- Suppressions / false positive tracking: SARIF has `suppressions` on results. Consider whether findings should carry a suppression state for user-acknowledged false positives. Not blocking for initial implementation.

## Related

- `docs/agents/privacy-guard/SCHEMA-PROPOSAL.md` — full design with SARIF gap analysis and test ergonomics rationale
- `docs/agents/privacy-guard/SCHEMA.md` — current schema
- #2 — agent should own PII categories (attribution, test matrix)
- #3 — validate-privacy-guard skill update
- #4 — agent interface contract (parent skill, `--json-schema`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema redesign: SARIF-aligned file findings, ruleId foreign keys, local-only scope #5

Problem

Design

SARIF-aligned file findings

Native format for non-file findings

`ruleId` foreign keys

Schema versioning

Scope narrowing to local-only

Work breakdown

Open questions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Schema redesign: SARIF-aligned file findings, ruleId foreign keys, local-only scope #5

Description

Problem

Design

SARIF-aligned file findings

Native format for non-file findings

ruleId foreign keys

Schema versioning

Scope narrowing to local-only

Work breakdown

Open questions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`ruleId` foreign keys