Skip to content

Enhance duplicate screenshot detection (hash/perceptual) #7

@arividar

Description

@arividar

Summary

--skip-duplicates currently compares raw PNG bytes for exact equality. This misses near-identical frames and is fragile to encoding differences.

Proposal

Introduce a more robust duplicate detection strategy:

  • Compute a fast hash (e.g., xxHash64/SHA-256) of the PNG buffer and compare hashes instead of storing entire buffers.
  • Optionally add perceptual hashing (aHash/pHash/dHash on a downscaled grayscale image) to detect visually identical frames within a small threshold.
  • Keep exact mode as default; add a flag like --duplicate-method exact|hash|phash and perhaps a --phash-threshold for tolerance (optional step).

Tasks

  • Refactor ScreenshotWriter to store last hash instead of last buffer.
  • Implement exact hash mode (hash of PNG buffer) as a first step.
  • (Optional) Implement perceptual hash mode with a small threshold.
  • Add tests for identical vs slightly different frames (unit tests can simulate buffers; perceptual tests can use small sample images).
  • Update README/help text to document the behavior.

Acceptance criteria

  • With --skip-duplicates, identical frames do not create files; different frames do.
  • Tests cover duplicate detection.
  • Documentation updated.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions