-
Notifications
You must be signed in to change notification settings - Fork 521
Feat/duplicate best shot detection #953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Feat/duplicate best shot detection #953
Conversation
…SSIE-Org#946 - Add PKGBUILD for binary package installation from releases - Add PKGBUILD-git for building from source - Add pictopy.install for post-install hooks - Add .SRCINFO metadata file - Add publish-aur.yml workflow for automatic AUR publishing - Update build-and-release.yml to generate tar.gz for AUR - Update tauri.conf.json to build all Linux targets (deb, rpm, appimage) - Update README.md with AUR installation instructions This enables Arch Linux users to install PictoPy via: yay -S pictopy paru -S pictopy Closes AOSSIE-Org#946
- Add confirmation dialog with warning message before deleting a folder - Display folder path in the confirmation dialog - Show clear warning that the action is irreversible - Add Cancel and Delete buttons with proper styling - Disable buttons during deletion to prevent double-clicks Fixes AOSSIE-Org#939
- Create ErrorBoundary component to catch JavaScript errors - Display user-friendly fallback UI with error details - Add 'Reload Application' button to recover from errors - Add 'Try Again' button to attempt recovery without reload - Wrap main App component with ErrorBoundary in main.tsx This prevents the app from showing a blank white screen when unhandled errors occur (e.g., network failures, missing data). Fixes AOSSIE-Org#937
…Org#918 - Add recursion guard flag to InterceptHandler.emit() - Emit directly to root logger handlers instead of calling logger.log() - Skip self handler when emitting to prevent infinite loop - Fix compatibility with Python 3.12's stricter recursion handling The issue was caused by emit() calling logger.log() which triggered the same handler again, causing infinite recursion until stack overflow. Fixes AOSSIE-Org#918
- Add duplicate_detection.py utility module with: - Perceptual hashing (pHash) for image similarity detection - Sharpness scoring using Laplacian variance - Exposure scoring using histogram analysis - Union-Find algorithm for grouping similar images - Add GET /images/duplicates endpoint that: - Finds groups of similar/duplicate images - Calculates quality scores for each image - Suggests the 'best shot' from each group - Supports configurable similarity threshold Quality metrics: - Sharpness: Higher = less blur (Laplacian variance) - Exposure: Proper brightness and contrast (histogram analysis) - Overall: 60% sharpness + 40% exposure Fixes AOSSIE-Org#905
📝 WalkthroughWalkthroughThe pull request introduces Arch User Repository (AUR) packaging support with automated release workflows, implements duplicate image detection with quality scoring for "best shot" selection, adds frontend error boundary handling, enhances folder deletion with confirmation dialogs, and updates Tauri build configuration for expanded Linux package formats. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant Backend as Backend API
participant DupDetect as Duplicate Detection
participant ImageDB as Image Storage
User->>Frontend: Request duplicate detection
Frontend->>Backend: GET /duplicates?similarity_threshold=10
Backend->>ImageDB: Retrieve all images
ImageDB-->>Backend: Image list (id, path, metadata)
Backend->>DupDetect: get_duplicate_groups_with_scores(images, threshold)
rect rgb(230, 240, 250)
note over DupDetect: Hash & Group Phase
DupDetect->>DupDetect: compute_phash for each image
DupDetect->>DupDetect: find duplicate groups via Hamming distance
end
rect rgb(240, 230, 250)
note over DupDetect: Quality Scoring Phase
DupDetect->>DupDetect: calculate_sharpness (Laplacian variance)
DupDetect->>DupDetect: calculate_exposure_score (histogram analysis)
DupDetect->>DupDetect: calculate_image_quality (combined score)
end
rect rgb(230, 250, 240)
note over DupDetect: Best Shot Selection
DupDetect->>DupDetect: Identify best_shot (highest overall score)
DupDetect->>DupDetect: Sort images by quality within each group
end
DupDetect-->>Backend: List[DuplicateGroup] with per-image scores
Backend-->>Frontend: GetDuplicatesResponse {groups, best_shots, total_duplicates}
Frontend->>User: Display duplicate groups with quality metrics & best shot recommendations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Fix all issues with AI Agents 🤖
In @.github/workflows/build-and-release.yml:
- Around line 408-415: The workflow uses an outdated action reference
"softprops/action-gh-release@v1"; update the uses line to the maintained latest
major tag (e.g., "softprops/action-gh-release@v2") so the release step runs on
current GitHub runners, then re-run CI and adjust any breaking inputs if the
action v2 changed parameters; keep the same inputs (tag_name, files) and env
(GITHUB_TOKEN) unless the new action docs require different names.
In @aur/PKGBUILD-git:
- Around line 62-72: The build currently creates and activates a virtualenv
(python -m venv, source venv/bin/activate, deactivate) and runs pip/pyinstaller
inside it, which will fail in clean chroots; instead remove the venv flow and
either (a) add required Python packages and pyinstaller to makedepends so the
system packages are used, or (b) if pip is unavoidable, replace the
virtualenv+pip calls with a chroot-safe installation such as python -m pip
install --root="$pkgdir" --no-deps -r requirements.txt and install pyinstaller
from makedepends (do not rely on source/activate), and adjust the pyinstaller
invocation (pyinstaller main.py --name PictoPy_Server --onedir --distpath dist)
to run from the build directory; also document any nonstandard build
requirements if you cannot avoid these approaches.
In @backend/app/logging/setup_logging.py:
- Line 230: The guard variable _inside_emit on the logging handler is shared
across threads and can drop logs under concurrency; make the guard thread-local
by replacing the module/instance-level _inside_emit with a threading.local()
storage (e.g., self._emit_state = threading.local()) and use a per-thread flag
like self._emit_state.inside_emit inside the emit() method to set, check and
clear the flag; update any references to _inside_emit accordingly so each thread
tracks its own reentrancy and logging isn't lost.
In @backend/app/utils/duplicate_detection.py:
- Around line 267-284: The code currently calls calculate_image_quality twice
for the same images (once in find_duplicate_groups and again in
get_duplicate_groups_with_scores); modify find_duplicate_groups to compute and
retain quality scores for each image (e.g., build a mapping from image id to its
quality result or attach score info to the DuplicateGroup) and change its
signature to return those computed scores (or include them in the returned
DuplicateGroup objects), then update get_duplicate_groups_with_scores to reuse
the provided scores instead of recomputing by calling calculate_image_quality
again; ensure references to find_duplicate_groups,
get_duplicate_groups_with_scores, calculate_image_quality, DuplicateGroup, and
groups_dict are updated to pass and consume the cached scores so image I/O and
scoring happen only once.
- Around line 173-201: The sharpness normalization in calculate_image_quality
currently hardcodes a divisor of 10 for normalized_sharpness (min(1.0,
np.log1p(sharpness) / 10)) without justification; update calculate_image_quality
to either document the expected Laplacian variance range in its docstring and
the rationale for the /10 divisor, or replace the literal with a configurable
constant/parameter (e.g., SHARPNESS_NORMALIZATION_DIVISOR or a function
parameter default) that is read from your app settings so deployments can tune
it; ensure you update the docstring to state the expected typical sharpness
range, describe the normalization behavior, and reference the new
config/parameter name in the docstring so maintainers know how to adjust it.
In @frontend/src/components/ErrorBoundary/ErrorBoundary.tsx:
- Around line 52-55: The current handleReset method in ErrorBoundary only clears
local error state but doesn't force a remount of the child subtree; update the
ErrorBoundary component to add a numeric state property (e.g., resetKey)
initialized to 0, increment resetKey inside handleReset (instead of or in
addition to clearing error/errorInfo), and ensure the render method applies that
resetKey as the key prop on the children container (so React will
unmount/remount the entire subtree when handleReset is called).
In @frontend/src/pages/SettingsPage/components/FolderManagementCard.tsx:
- Line 171: The Dialog's onOpenChange currently only toggles isDeleteDialogOpen
which leaves folderToDelete set when the dialog is dismissed via Escape or
click-away; update the onOpenChange handler used on Dialog (the one calling
setIsDeleteDialogOpen) to also clear folderToDelete when the new open state is
false (i.e., when closing) so folderToDelete is reset and state remains
consistent.
🧹 Nitpick comments (11)
backend/app/logging/setup_logging.py (3)
249-252: Record mutation may affect downstream handlers.Modifying
record.msgand clearingrecord.argsdirectly mutates the sharedLogRecordobject. If any handler in the iteration raises an exception or if there are other references to this record, subsequent handlers will see the mutated state.Consider working with a copy of the record or at least documenting this behavior as intentional:
🔎 Proposed fix to avoid mutating the original record
self._inside_emit = True try: - # Modify the record message to include uvicorn prefix - original_msg = record.getMessage() - record.msg = f"[uvicorn] {original_msg}" - record.args = () # Clear args since we've already formatted the message + # Create a copy to avoid mutating the original record + import copy + modified_record = copy.copy(record) + original_msg = modified_record.getMessage() + modified_record.msg = f"[uvicorn] {original_msg}" + modified_record.args = () # Get the root logger and emit directly to its handlers root_logger = logging.getLogger() for handler in root_logger.handlers: if handler is not self: - if record.levelno >= handler.level: - handler.emit(record) + if modified_record.levelno >= handler.level: + handler.emit(modified_record)
217-263: Implementation diverges fromsync-microservice.The
sync-microservice/app/logging/setup_logging.py(lines 238-257) uses a simpler approach that callslogger.log()directly without a recursion guard. This divergence could lead to maintenance burden and inconsistent behavior between services.If the recursion issue was specific to the backend, consider documenting why this approach differs, or align the implementations if possible.
243-263: Consider handlingexc_infoexplicitly per project learning.Based on learnings from this repository, stack traces should not be shown in intercepted logs, and
exc_info/stack_infoshould not be preserved when rerouting throughInterceptHandler. The current implementation passes the record (including anyexc_info) to downstream handlers.If stack traces should be suppressed for uvicorn logs, explicitly clear them:
🔎 Proposed addition to suppress stack info
original_msg = record.getMessage() record.msg = f"[uvicorn] {original_msg}" record.args = () # Clear args since we've already formatted the message + record.exc_info = None # Don't preserve stack traces per project guidelines + record.exc_text = None + record.stack_info = Nonebackend/app/routes/images.py (1)
158-201: Consider adding pagination or limiting results for large image libraries.This endpoint fetches all images from the database and performs computationally expensive operations (hashing, quality scoring) on each one. For libraries with thousands of images, this could result in significant response times and memory usage.
Consider:
- Adding a
limitparameter to cap the number of images processed- Implementing pagination for the duplicate groups
- Making this an async/background job for large libraries
🔎 Suggested approach with a limit parameter
def get_duplicate_images( similarity_threshold: int = Query( default=10, ge=1, le=50, description="Maximum hash distance to consider images as duplicates (lower = stricter, default 10)" - ) + ), + limit: Optional[int] = Query( + default=None, + ge=1, + le=5000, + description="Maximum number of images to analyze (None = all images)" + ) ):Then in the implementation:
# Get all images from database images = db_get_all_images() + + if limit and len(images) > limit: + logger.warning(f"Limiting duplicate detection to {limit} images out of {len(images)}") + images = images[:limit]backend/app/utils/duplicate_detection.py (2)
10-15: Remove unused import.
PIL.Imageis imported but never used in this module. OpenCV (cv2) handles all image operations.🔎 Proposed fix
import cv2 import numpy as np from typing import List, Dict, Tuple, Optional from dataclasses import dataclass from collections import defaultdict -from PIL import Image
250-256: O(n²) pairwise comparison may be slow for large image libraries.The nested loop compares all pairs of images, resulting in O(n²) complexity. For 1000 images, this means ~500K comparisons; for 10K images, ~50M comparisons.
For large libraries, consider optimizations like:
- Locality-sensitive hashing (LSH) for approximate nearest neighbor search
- Hash bucketing by prefix bits to reduce comparisons
- Early termination when enough groups are found
For now, this is acceptable for moderate-sized libraries, but worth noting for future optimization if performance becomes an issue.
aur/PKGBUILD (2)
48-49: SHA256 checksums are set to SKIP.Using
SKIPbypasses integrity verification. While the publish-aur.yml workflow will update these with real checksums when assets are available, consider whether this is acceptable for your security posture. AUR guidelines recommend proper checksums for reproducibility and security.
57-60: Preserve permissions when copying libraries.Using
cp -rwithout-aor--preservemay lose file permissions and ownership metadata, which could cause runtime issues for shared libraries.🔎 Proposed fix
# Install libraries and resources if [ -d "usr/lib" ]; then - cp -r usr/lib "${pkgdir}/usr/" + cp -a usr/lib "${pkgdir}/usr/" fi.github/workflows/publish-aur.yml (2)
33-37: Hardcoded sleep is unreliable for asset availability.A fixed 60-second wait may be insufficient if release asset uploads are slow, or wasteful if they're fast. Consider polling the release API until assets are available.
🔎 Proposed fix with polling
- name: Wait for release assets run: | echo "Waiting for release assets to be available..." - sleep 60 + VERSION="${{ steps.get_version.outputs.version }}" + BASE_URL="https://api.github.com/repos/AOSSIE-Org/PictoPy/releases/tags/v${VERSION}" + for i in {1..30}; do + ASSETS=$(curl -s "$BASE_URL" | jq '.assets | length') + if [ "$ASSETS" -ge 2 ]; then + echo "Assets available after $((i * 10)) seconds" + break + fi + echo "Waiting for assets... attempt $i" + sleep 10 + done
97-145: Generated .SRCINFO uses leading whitespace workaround.The heredoc generates content with leading spaces that are later stripped with
sed. This works but is fragile. Consider using a heredoc without leading indentation or using<<-EOFwith tabs.aur/PKGBUILD-git (1)
87-94: Build steps lack error handling.If
npm installorcargo buildfails, the script continues silently. Consider addingset -eat the start ofbuild()or checking exit codes.🔎 Proposed fix
build() { + set -e cd "${srcdir}/${pkgname}"
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
.github/workflows/build-and-release.yml.github/workflows/publish-aur.ymlREADME.mdaur/.SRCINFOaur/PKGBUILDaur/PKGBUILD-gitaur/README.mdaur/pictopy.installbackend/app/logging/setup_logging.pybackend/app/routes/images.pybackend/app/utils/duplicate_detection.pyfrontend/src-tauri/tauri.conf.jsonfrontend/src/components/ErrorBoundary/ErrorBoundary.tsxfrontend/src/components/ErrorBoundary/index.tsfrontend/src/main.tsxfrontend/src/pages/SettingsPage/components/FolderManagementCard.tsx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T16:40:22.622Z
Learnt from: Hemil36
Repo: AOSSIE-Org/PictoPy PR: 548
File: backend/app/logging/setup_logging.py:224-243
Timestamp: 2025-10-13T16:40:22.622Z
Learning: In the PictoPy project's centralized logging system (backend/app/logging/setup_logging.py and sync-microservice/app/logging/setup_logging.py), stack traces should not be shown in logs. When rerouting logs through InterceptHandler, exc_info and stack_info should not be preserved.
Applied to files:
backend/app/logging/setup_logging.py
🧬 Code graph analysis (3)
backend/app/routes/images.py (3)
backend/app/utils/duplicate_detection.py (1)
get_duplicate_groups_with_scores(290-332)backend/app/schemas/images.py (1)
ErrorResponse(59-62)backend/app/database/images.py (1)
db_get_all_images(123-214)
backend/app/utils/duplicate_detection.py (1)
backend/app/routes/images.py (1)
DuplicateGroup(145-149)
backend/app/logging/setup_logging.py (1)
sync-microservice/app/logging/setup_logging.py (1)
emit(239-258)
🪛 actionlint (1.7.9)
.github/workflows/build-and-release.yml
408-408: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (30)
frontend/src/components/ErrorBoundary/index.ts (1)
1-1: LGTM!Standard barrel export pattern that enables clean imports throughout the codebase.
frontend/src/main.tsx (2)
5-5: LGTM!The import correctly uses the named export from the barrel file.
36-40: LGTM!Excellent placement of the ErrorBoundary at the top level, wrapping the entire application. This ensures comprehensive error catching throughout the component tree.
frontend/src/components/ErrorBoundary/ErrorBoundary.tsx (5)
1-12: LGTM!The imports and type definitions are correct and follow React error boundary patterns.
21-28: LGTM!Standard constructor pattern with proper state initialization.
30-44: LGTM!The error handling lifecycle methods correctly follow React's error boundary pattern. Good logging for debugging purposes.
57-121: LGTM!Excellent fallback UI implementation with:
- Clear error messaging and user-friendly design
- Collapsible error details for debugging without overwhelming users
- Both reload and reset recovery options
- Good accessibility with focus rings and semantic HTML
- Responsive design with dark mode support
124-124: LGTM!Standard default export that integrates correctly with the barrel export.
frontend/src/pages/SettingsPage/components/FolderManagementCard.tsx (4)
1-14: LGTM! Clean import additions.All new imports are properly used in the confirmation dialog implementation.
106-106: LGTM! Proper delegation to confirmation handler.The delete button now correctly opens the confirmation dialog instead of immediately deleting the folder.
172-209: LGTM! Excellent confirmation dialog implementation.The dialog provides a clear, accessible, and well-styled confirmation workflow with:
- Prominent warning icon and styling appropriate for destructive actions
- Display of the target folder path for user verification
- Clear explanation of consequences (irreversible, data loss)
- Proper disabled states and loading feedback during deletion
40-65: Delete error handling is already implemented—no changes needed.Error handling for failed deletions is properly managed through the
useMutationFeedbackhook (useFolderOperations.tsx, lines 130-140). When a deletion fails, an error dialog is displayed via the globalshowInfoDialogmechanism in Redux, independent of the local delete confirmation dialog. The dialog closing does not prevent error feedback; instead, errors are communicated through a dedicated notification system that persists even after the dialog closes.Additionally, the delete button remains disabled during the operation (
disabled={deleteFolderPending}), preventing unintended repeated clicks.Likely an incorrect or invalid review comment.
backend/app/routes/images.py (4)
6-6: LGTM!The import is correctly added to bring in the duplicate detection utility function.
134-155: LGTM!The Pydantic models are well-structured and align with the data returned by
get_duplicate_groups_with_scores. The field types match the expected output from the utility module.
219-231: LGTM!The response transformation correctly maps the utility output to the Pydantic models. Using
**img_infofor unpacking is clean and the model validation will catch any schema mismatches.
241-250: LGTM!Good error handling with proper logging and consistent error response format matching other endpoints in this file.
backend/app/utils/duplicate_detection.py (5)
41-77: LGTM!The pHash implementation is correct:
- Uses DCT-based approach which is robust to resizing and minor changes
- Proper error handling with logging
- Returns
Nonefor unreadable images allowing graceful degradation
80-91: LGTM!Hamming distance calculation is straightforward and correct using NumPy's vectorized comparison.
94-118: LGTM!The Laplacian variance method is a well-established technique for blur detection. Error handling correctly returns 0.0 for unreadable images.
121-170: LGTM!The exposure scoring algorithm is well-designed with:
- Brightness deviation from ideal (128)
- Contrast measurement via standard deviation
- Clipping penalty for over/under-exposed pixels
- Reasonable weights for combining factors
The clamping to [0.0, 1.0] at the end ensures valid output range.
290-331: LGTM on the API wrapper structure.The function correctly transforms internal data structures to API-friendly dictionaries. Rounding scores to 2-4 decimal places is appropriate for the API response.
README.md (1)
5-34: LGTM! Clear installation documentation.The installation section is well-structured and provides comprehensive guidance for multiple platforms and package managers. The AUR instructions align with the new packaging files introduced in this PR.
frontend/src-tauri/tauri.conf.json (1)
10-21: LGTM! Expanded Linux packaging support.The changes correctly expand Linux packaging targets:
"all"is valid Tauri syntax for building all available formats- AppImage with
bundleMediaFramework: trueensures media codecs are included- RPM with
release: "1"follows standard RPM conventionsThese changes align with the broader AUR packaging support introduced in this PR.
aur/README.md (1)
1-126: LGTM! Comprehensive AUR packaging guide.This documentation provides excellent coverage for both users and maintainers, including installation methods, dependency management, automated publishing setup, and troubleshooting guidance.
aur/pictopy.install (1)
1-47: LGTM! Standard Arch Linux install hooks.The install script follows Arch packaging conventions correctly:
- Conditionally updates icon cache and desktop database
- Preserves user data on removal with clear guidance
- Provides helpful status messages during install/upgrade/removal
.github/workflows/build-and-release.yml (1)
358-415: Include duplicate detection implementation files in this review.The PR objectives mention implementing duplicate image detection, but the files provided for review only contain AUR packaging workflow configuration. The actual implementation files (
backend/app/utils/duplicate_detection.pyand the/duplicatesendpoint inbackend/app/routes/images.py) exist in the repository but are not included in this review batch.If this PR is intended to introduce the duplicate detection feature, these core implementation files should be reviewed alongside the infrastructure changes. Please provide them for review or clarify if they were merged separately.
.github/workflows/publish-aur.yml (2)
46-60: Consider failing the workflow when downloads fail.Currently, if tarball downloads fail, the workflow continues with
SKIPchecksums. This could publish an AUR package with unverified sources. Consider whether this fallback behavior is intentional or if the workflow should fail to prevent publishing incomplete metadata.
149-159: Pinned action version is good practice.Using
KSXGitHub/github-actions-deploy-aur@v3.0.1with a specific version is good for reproducibility. Note thatforce_push: truewill overwrite AUR git history, which is typically acceptable for automated updates but worth being aware of.aur/PKGBUILD-git (1)
140-163: Hardcoded MIT license fallback is reasonable but consider maintenance.The embedded license text provides a fallback when the LICENSE file isn't found. This is a pragmatic solution, but ensure the embedded text stays synchronized with the actual project license.
aur/.SRCINFO (1)
1-44: SRCINFO structure looks correct.The .SRCINFO format and structure are valid for AUR. Note that this file will be regenerated by the publish-aur.yml workflow on each release, so manual edits here will be overwritten. The SKIP checksums will be replaced with actual hashes when the workflow runs.
Description
Fixes #905
This PR adds duplicate image detection and "best shot" suggestion functionality. When a user takes multiple photos of the same scene (e.g., 10 photos of a sunset), the system finds the duplicates and suggests the best one based on sharpness and exposure.
Changes Made
duplicate_detection.pyutility module with:GET /images/duplicatesendpoint that:Quality Metrics
API Usage
GET /images/duplicates?similarity_threshold=10
Response:
{ "success": true, "message": "Found 3 duplicate groups with 12 total images", "data": [ { "group_id": 0, "image_count": 4, "best_shot_id": "uuid-of-best-image", "images": [ { "id": "...", "path": "...", "sharpness_score": 245.67, "exposure_score": 0.82, "overall_score": 0.76, "is_best_shot": true } ] } ] } <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Detects duplicate and similar images with quality scoring to identify best shots * Added error boundary component for improved application stability * Added confirmation dialog for safer folder deletion * **Bug Fixes** * Fixed backend logging recursion issue * **Documentation** * Added installation instructions for Arch Linux (AUR), AppImage, and other platforms * Created AUR packaging guide with automated release workflow * **Chores** * Extended Linux build support with AppImage and RPM formats <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai -->