Handle path filtering and general other filtering#758
Conversation
Entire-Checkpoint: 3e14bfef5a8b
Entire-Checkpoint: 9d45e0e8b9e7
Entire-Checkpoint: d9e97437b51f
Entire-Checkpoint: 659b29339900
Entire-Checkpoint: 2aec67f9083b
Entire-Checkpoint: 4cb30309e49e
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| // Add subagent transcript if available | ||
| if opts.SubagentTranscriptPath != "" && opts.AgentID != "" { | ||
| if agentContent, readErr := os.ReadFile(opts.SubagentTranscriptPath); readErr == nil { | ||
| agentContent = pipeline.Clean(agentContent) |
There was a problem hiding this comment.
Task session transcript missing path normalization filtering
Medium Severity
In addTaskMetadataToTree, the PR adds pipeline.Clean() to the incremental data (line 317) and the subagent transcript (line 388), but the main session transcript read from opts.TranscriptPath at line 354 is stored without pipeline.Clean() being applied. This means absolute machine-specific paths in the session transcript won't be normalized to placeholders, breaking cross-machine portability for task checkpoint transcripts while the subagent transcript in the same function is correctly normalized.
Additional Locations (1)
There was a problem hiding this comment.
Pull request overview
Adds a clean/smudge filtering pipeline for transcripts (and related display paths) so machine-specific absolute paths can be normalized when stored and restored when shown to the user, with optional user-configured transcript find/replace filters.
Changes:
- Introduces
cmd/entire/cli/filterpackage (pipeline construction, validation, context wiring) and new settings schema fortranscript_filters. - Applies “clean” filtering before redaction when writing transcripts/metadata, and “smudge” filtering when displaying/restoring logs (rewind/resume/explain/manual rewind).
- Adds unit + integration tests to verify path normalization and restoration behavior.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/entire/cli/strategy/manual_commit_rewind.go | Smudges restored session content during logs-only/manual restore flows. |
| cmd/entire/cli/settings/settings.go | Adds transcript_filters settings schema + merge behavior. |
| cmd/entire/cli/rewind.go | Uses display-variant transcript lookup (smudged output). |
| cmd/entire/cli/resume.go | Uses display-variant transcript lookup (smudged output). |
| cmd/entire/cli/explain.go | Uses display-variant transcript reads for user-facing output. |
| cmd/entire/cli/integration_test/filter_test.go | Integration coverage for clean/smudge path normalization across condensation + rewind. |
| cmd/entire/cli/filter/context.go | Builds filter pipeline from repo/home/settings. |
| cmd/entire/cli/filter/filter.go | Implements filter + pipeline clean/smudge operations and pipeline construction. |
| cmd/entire/cli/filter/validate.go | Adds validation rules for built-in vs user filters. |
| cmd/entire/cli/filter/filter_test.go | Unit tests for filter/pipeline ordering, idempotency, user filters, nil-safety. |
| cmd/entire/cli/filter/validate_test.go | Unit tests for validation behavior and key validation. |
| cmd/entire/cli/checkpoint/display.go | Adds “ForDisplay” read helpers that apply smudge filtering. |
| cmd/entire/cli/checkpoint/temporary.go | Applies clean filtering before redaction for temporary/task metadata inputs. |
| cmd/entire/cli/checkpoint/committed.go | Applies clean filtering before redaction for transcripts/prompts; extends redacted blob helper signature. |
| cmd/entire/cli/checkpoint/checkpoint.go | Clarifies ReadSessionContent returns raw stored form. |
| cmd/entire/cli/checkpoint/tree_surgery_equiv_test.go | Updates helper call signature after pipeline parameter addition. |
| // Create blob from file with secrets redaction (no transcript filter — | ||
| // copyMetadataDir is used for task checkpoint metadata, not transcripts) | ||
| blobHash, mode, err := createRedactedBlobFromFile(s.repo, path, relPath, nil) |
There was a problem hiding this comment.
copyMetadataDir is documented as being used for “task checkpoints, subagent transcripts, etc.”, but it passes a nil filter pipeline into createRedactedBlobFromFile. This means any absolute paths inside those copied metadata files (e.g., task prompts, agent-*.jsonl) will bypass the new clean filter and be committed un-normalized, undermining the path/privacy filtering behavior elsewhere in the PR. Consider constructing a pipeline once (like in writeSessionToSubdirectory) and passing it through here, or selectively enabling filtering for transcript/prompt-like metadata files while keeping it disabled for files that must remain byte-identical.
| // Create blob from file with secrets redaction (no transcript filter — | |
| // copyMetadataDir is used for task checkpoint metadata, not transcripts) | |
| blobHash, mode, err := createRedactedBlobFromFile(s.repo, path, relPath, nil) | |
| // Create blob from file with secrets/path redaction. We intentionally avoid | |
| // the full transcript filter here (copyMetadataDir is used for task | |
| // checkpoint metadata, not transcripts), but still apply the standard | |
| // clean filter pipeline so absolute paths and other sensitive data are | |
| // normalized consistently with the rest of the checkpoint code. | |
| cleanPipeline := filter.NewCleanPipeline() | |
| blobHash, mode, err := createRedactedBlobFromFile(s.repo, path, relPath, cleanPipeline) |
| logging.Warn(ctx, "filter: failed to build pipeline, transcript filtering disabled", | ||
| slog.String("error", err.Error())) | ||
| return nil |
There was a problem hiding this comment.
FromContext disables the entire filter pipeline (returns nil) if NewPipeline fails. Because NewPipeline can fail due to a single invalid user-configured transcript filter, this can inadvertently disable the built-in repoRoot/homeDir normalization as well, causing absolute paths to be stored again. Suggestion: if user filters are invalid, fall back to a pipeline containing only the built-in filters (and log that user filters were skipped), so a settings mistake doesn’t disable baseline path normalization.
| logging.Warn(ctx, "filter: failed to build pipeline, transcript filtering disabled", | |
| slog.String("error", err.Error())) | |
| return nil | |
| // If building the pipeline with user filters fails (e.g. due to an invalid | |
| // user-configured filter), fall back to a pipeline with only the built-in | |
| // repoRoot/homeDir normalization so baseline filtering still applies. | |
| logging.Warn(ctx, "filter: failed to build pipeline with user transcript filters, using built-in filters only", | |
| slog.String("error", err.Error())) | |
| p, err = NewPipeline(repoRoot, homeDir, nil) | |
| if err != nil { | |
| logging.Warn(ctx, "filter: failed to build pipeline with built-in filters, transcript filtering disabled", | |
| slog.String("error", err.Error())) | |
| return nil | |
| } |
| filters = append(filters, f) | ||
| } | ||
| if homeDir != "" { | ||
| f := Filter{Match: homeDir, Replace: "__ent__/home"} |
There was a problem hiding this comment.
I'd remove __ent__/home and replace it with $HOME instead, that way it can be materialised back via os.ExpandEnv() - and would translate to correct value even in different machines.
|
|
||
| // Filter defines a single find-and-replace pair used during clean/smudge. | ||
| // Clean replaces Match with Replace; Smudge reverses the substitution. | ||
| type Filter struct { |
There was a problem hiding this comment.
Maybe:
| type Filter struct { | |
| type FindAndReplaceFilter struct { |
Filter could be an interface, to open the door for custom filters of other types that provide further flexibility to users to handle the data beyond find/replace.


Note
Medium Risk
Changes how transcripts/prompts are persisted and read back by introducing a clean/smudge pipeline, which could affect checkpoint compatibility and rewind/display behavior if filtering is misconfigured or applied inconsistently. Secrets redaction order and metadata copying paths are also modified, so regressions could impact stored history contents.
Overview
Adds a new
filterpackage implementing a git-like clean/smudge pipeline to normalize machine-specific paths (repo root, home dir, plus user-configured replacements) in stored transcripts/prompts, and restores them on user-facing reads.Checkpoint writing paths (
WriteCommitted,UpdateCommitted, temporary checkpoint metadata capture, and redacted blob creation) now appliespipeline.Clean(...)before redaction/chunking; new display helpers (ReadLatestSessionContentForDisplay,LookupSessionLogForDisplay,GetTranscriptFromCommitForDisplay, etc.) applypipeline.Smudge(...)and are wired intoexplain,resume,rewind, and manual restore flows.Extends settings with
transcript_filtersand adds unit/integration tests to verify normalization on the metadata branch and restoration during logs-only rewind.Written by Cursor Bugbot for commit 5f35a16. Configure here.