Improve Face Clustering Accuracy & Enable Automatic Clustering on Folder Add #914

290Priyansh · 2026-01-02T12:34:28Z

Fixes #915
This PR improves PictoPy’s face clustering pipeline to behave more like a real-world face recognition system.

Key improvements:

Automatic face detection & clustering on every folder addition (no manual triggers)
AI tagging enabled by default for new folders
Tuned clustering thresholds to correctly group the same person across different poses, expressions, and lighting
Optional face alignment preprocessing to improve embedding quality
Forced reclustering on sync to ensure new images are immediately clustered

pictopy.mp4

Summary by CodeRabbit

New Features
- Face alignment preprocessing now enabled by default to improve face detection accuracy.
- AI tagging automatically enabled for new folders.
- Untagged images now processed with full face cluster reclustering for consistency.
- Face clustering parameters optimized for improved accuracy.
Tests
- Added comprehensive test suite for face alignment utilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2026-01-02T12:34:45Z

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

coderabbitai · 2026-01-02T12:34:46Z

📝 Walkthrough

Walkthrough

A new face alignment preprocessing utility is introduced with configuration control, enabling rotation-based eye alignment during face cropping. Face detection is updated to use this alignment utility. Folder processing workflows are enhanced to handle untagged images and perform full reclustering. Face clustering default similarity and merge thresholds are adjusted to calibrate the pipeline.

Changes

Cohort / File(s)	Summary
Configuration `backend/app/config/settings.py`	Adds `FACE_ALIGNMENT_ENABLED` global flag (default: `True`) to control face alignment preprocessing behavior
Face Alignment Utilities `backend/app/utils/face_alignment.py`, `backend/tests/test_face_alignment.py`	New utility module provides `estimate_eye_positions()`, `align_face_simple()`, and `simple_face_crop()` functions for alignment-driven cropping with eye-position-based rotation and fallback handling; comprehensive test coverage for edge cases and both enabled/disabled modes
Face Detection Integration `backend/app/models/FaceDetector.py`	Replaces manual face cropping with `align_face_simple()` utility within detection loop, preserving preprocessing and embedding extraction workflow
Clustering Parameter Tuning `backend/app/utils/face_clusters.py`	Adjusts default similarity thresholds in `cluster_util_cluster_all_face_embeddings()` (eps: `0.75` → `0.5`, similarity: `0.85` → `0.65`) and `cluster_util_assign_cluster_to_faces_without_clusterId()` (similarity: `0.8` → `0.65`); lowers effective merge threshold to `0.60`
Folder Processing Workflows `backend/app/routes/folders.py`	Enhances `post_folder_add_sequence()` and `post_sync_folder_sequence()` to process untagged images and perform full reclustering (`force_full_reclustering=True`); enables AI tagging by default on folder addition

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as routes/folders.py
    participant ImgProc as Image Processing
    participant Untagged as Untagged Handler
    participant Cluster as Clustering
    participant Watcher as Sync Watcher
    
    Client->>API: Add/Sync Folder
    activate API
    
    API->>ImgProc: Process Folder Images
    activate ImgProc
    Note over ImgProc: FaceDetector uses<br/>align_face_simple()
    ImgProc-->>API: Images Processed
    deactivate ImgProc
    
    rect rgb(200, 220, 255)
    Note over API,Untagged: New: Handle Untagged Images
    API->>Untagged: Process Untagged Images
    activate Untagged
    Untagged-->>API: Untagged Processed
    deactivate Untagged
    end
    
    rect rgb(200, 220, 255)
    Note over API,Cluster: New: Full Reclustering
    API->>Cluster: Full Reclustering<br/>(force_full_reclustering=True)
    activate Cluster
    Note over Cluster: Uses adjusted<br/>similarity thresholds
    Cluster-->>API: Clustering Complete
    deactivate Cluster
    end
    
    API->>Watcher: Restart Sync Watcher
    activate Watcher
    Watcher-->>API: Watcher Restarted
    deactivate Watcher
    
    API-->>Client: Folder Processing Complete
    deactivate API

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: improve face clustering accuracy with similarity threshold and p… #771: Directly overlaps with clustering threshold modifications to cluster_util_cluster_all_face_embeddings() and cluster_util_assign_cluster_to_faces_without_clusterId() in face_clusters.py
Gsoc'25 Frontend Revamp #489: Overlaps on modifications to backend/app/models/FaceDetector.py and backend/app/config/settings.py regarding face detection and configuration constants

Suggested labels

backend, medium

Suggested reviewers

rahulharpal1603

Poem

🐰 Whiskers twitch with joy sublime,
Eyes aligned, now it's prime time!
Rotations smooth, clusters sing,
Face preprocessing takes the wing! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objectives: improving face clustering accuracy through tuned thresholds and enabling automatic clustering on folder addition.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-02T12:35:57Z

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

backend/app/utils/face_alignment.py (2)
17-37: Heuristic eye positions may not work for non-frontal faces.

The function uses fixed proportions (35%/65% horizontal, 35% vertical) to estimate eye positions without actual detection. This approach:

Works reasonably for frontal, upright faces

Fails for profile views, extreme angles, or unusual poses

May misalign faces when heuristics don't match actual eye positions

Since the PR objectives mention handling "different poses, expressions, and lighting conditions," consider whether this heuristic approach is sufficient or if actual eye detection (e.g., using MediaPipe Face Mesh or dlib landmarks) would better meet the goals.

96-98: Consider documenting the 3-degree threshold rationale.

The 3-degree threshold for skipping rotation is reasonable (avoids processing near-horizontal faces), but this magic number could benefit from a named constant or comment explaining why 3 degrees was chosen.
🔎 Suggested improvement
+        # Rotation threshold in degrees - skip rotation for nearly horizontal eyes
+        # to avoid unnecessary processing (3 degrees ≈ 5% vertical deviation)
+        MIN_ROTATION_ANGLE = 3
+        
         # Only apply rotation if angle is significant (> 3 degrees)
-        if abs(angle) < 3:
+        if abs(angle) < MIN_ROTATION_ANGLE:
             return face_crop
backend/app/routes/folders.py (1)

72-73: Add clarifying comment explaining the clustering strategy for AI tagging enablement.

The inconsistency across sequences is intentional: post_folder_add_sequence (line 73) and post_sync_folder_sequence (line 122) force full reclustering for filesystem changes, while post_AI_tagging_enabled_sequence (line 94) uses smart clustering with the 24-hour rule and face count thresholds.

Add an inline comment at line 94 explaining why AI tagging uses smart clustering instead of forced reclustering—this would parallel the existing comments at lines 73 and 122 and match the design pattern throughout the code.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4043751 and e9e8f74.

📒 Files selected for processing (6)

backend/app/config/settings.py
backend/app/models/FaceDetector.py
backend/app/routes/folders.py
backend/app/utils/face_alignment.py
backend/app/utils/face_clusters.py
backend/tests/test_face_alignment.py

🧰 Additional context used

🧬 Code graph analysis (3)

backend/app/models/FaceDetector.py (1)

backend/app/utils/face_alignment.py (1)

align_face_simple (40-122)

backend/app/routes/folders.py (2)

backend/app/utils/images.py (1)

image_util_process_untagged_images (87-101)

backend/app/utils/face_clusters.py (1)

cluster_util_face_clusters_sync (94-162)

backend/tests/test_face_alignment.py (1)

backend/app/utils/face_alignment.py (3)

align_face_simple (40-122)

simple_face_crop (125-149)

estimate_eye_positions (17-37)

🔇 Additional comments (14)

backend/app/utils/face_clusters.py (2)

318-335: Consistent threshold change across functions.

The similarity_threshold reduction from 0.8 to 0.65 is consistent with the changes in cluster_util_cluster_all_face_embeddings. While consistency is good, note the same concern about this being very lenient for face assignment applies here as well.

188-202: I need the original review comment to rewrite. Please provide the review comment content (typically within <review_comment> tags) along with any relevant context about the code being reviewed.

backend/app/config/settings.py (1)

26-28: LGTM! Clean configuration addition.

The FACE_ALIGNMENT_ENABLED flag provides a clear toggle for the face alignment feature. The default value of True aligns with the PR objective of improving face recognition accuracy.

backend/app/models/FaceDetector.py (1)

47-51: Good integration with face alignment utility.

The change cleanly integrates the new align_face_simple utility, replacing manual cropping. The utility includes proper error handling and fallback to simple cropping when alignment fails or is disabled, maintaining robustness.

backend/tests/test_face_alignment.py (8)

16-31: Well-structured test fixtures.

The fixtures provide clear, reusable test data with a sample image containing visual markers (circles for "eyes") that align well with what the alignment utilities expect.

34-48: Good test coverage for simple_face_crop.

The test verifies correct dimensions accounting for padding and validates that the result is a valid 3-channel color image.

50-60: Good boundary condition test.

Testing face crop at image boundaries ensures the function handles edge cases gracefully without crashing.

62-84: Thorough validation of eye position estimation.

The test verifies eye coordinates are reasonable and that the left eye is correctly positioned to the left of the right eye.

100-115: Good test for alignment-enabled behavior.

Verifies that when alignment is enabled, the function returns a valid image with reasonable dimensions.

117-136: Comprehensive edge case testing.

Tests handle tiny images, grayscale inputs, and potential failures gracefully. The flexible assertion (accepting either success or controlled failure) is appropriate for edge cases.

138-161: Good coverage of rotation and padding scenarios.

Tests verify minimal rotation behavior and validate different padding values work correctly.

163-174: Good test for invalid input handling.

Tests that zero-dimension bounding boxes are handled gracefully without crashing.

backend/app/utils/face_alignment.py (2)

40-122: Robust alignment implementation with good fallback logic.

The function:

Properly checks FACE_ALIGNMENT_ENABLED flag

Handles empty crops gracefully

Only rotates when angle exceeds 3 degrees (avoids unnecessary processing)

Uses cv2.BORDER_REPLICATE to handle edge pixels during rotation

Falls back to simple crop on any exception

The error handling and fallback mechanism ensure the pipeline doesn't break even when alignment fails.

125-149: Clean fallback implementation.

The simple_face_crop function provides a straightforward fallback that properly handles boundary conditions with max() and min() operations.

coderabbitai · 2026-01-02T12:38:03Z

backend/app/models/FaceDetector.py

+                # Use face alignment utility for better pose handling
+                from app.utils.face_alignment import align_face_simple
+
+                face_img = align_face_simple(img, bbox, padding=20)


🛠️ Refactor suggestion | 🟠 Major

Move the import to module level.

The import statement is currently inside the loop, which means it will be executed for every face detected. This is inefficient and goes against Python best practices.

🔎 Proposed fix

Move the import to the top of the file with other imports:

from app.models.FaceNet import FaceNet from app.utils.FaceNet import FaceNet_util_preprocess_image, FaceNet_util_get_model_path from app.utils.YOLO import YOLO_util_get_model_path from app.models.YOLO import YOLO from app.database.faces import db_insert_face_embeddings_by_image_id from app.logging.setup_logging import get_logger +from app.utils.face_alignment import align_face_simple

Then remove it from the loop:

- # Use face alignment utility for better pose handling - from app.utils.face_alignment import align_face_simple - face_img = align_face_simple(img, bbox, padding=20)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Use face alignment utility for better pose handling

from app.utils.face_alignment import align_face_simple

face_img = align_face_simple(img, bbox, padding=20)

face_img = align_face_simple(img, bbox, padding=20)

🤖 Prompt for AI Agents

In backend/app/models/FaceDetector.py around lines 47 to 50, the import of align_face_simple is inside the loop causing repeated imports; move "from app.utils.face_alignment import align_face_simple" to the module-level imports at the top of the file with the other imports, then remove the in-loop import so the loop only calls align_face_simple(img, bbox, padding=20).

coderabbitai · 2026-01-02T12:38:03Z

backend/app/routes/folders.py

+        image_util_process_untagged_images()
+        cluster_util_face_clusters_sync(force_full_reclustering=True)  # Force full reclustering for new photos


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for performance-related configuration options or limits for clustering rg -n -C3 'force_full_reclustering|cluster.*config|clustering.*threshold|max.*faces|performance' --type=py

Repository: AOSSIE-Org/PictoPy

Length of output: 3064

🏁 Script executed:

#!/bin/bash # Find the implementation of cluster_util_face_clusters_sync to understand actual behavior rg -n 'def cluster_util_face_clusters_sync|def face_clusters_sync' --type=py -A 15

Repository: AOSSIE-Org/PictoPy

Length of output: 1290

🏁 Script executed:

#!/bin/bash # Check all usages of force_full_reclustering to understand the pattern rg -n 'force_full_reclustering' --type=py -B 2 -A 2

Repository: AOSSIE-Org/PictoPy

Length of output: 2316

🏁 Script executed:

#!/bin/bash # Get the implementation of cluster_util_cluster_all_face_embeddings rg -n 'def cluster_util_cluster_all_face_embeddings' --type=py -A 20

Repository: AOSSIE-Org/PictoPy

Length of output: 1774

🏁 Script executed:

#!/bin/bash # Get the implementation of cluster_util_is_reclustering_needed to understand the 24-hour rule rg -n 'def cluster_util_is_reclustering_needed' --type=py -A 15

Repository: AOSSIE-Org/PictoPy

Length of output: 1210

🏁 Script executed:

#!/bin/bash # Search for any documentation about performance or the behavior of full reclustering rg -n 'reclustering|cluster.*all.*faces|cluster_all' --type=py | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 3115

Misleading comment: Full reclustering processes all faces, not just new photos.

The comment "Force full reclustering for new photos" is inaccurate. The cluster_util_cluster_all_face_embeddings() function retrieves and reclusters all faces in the database (via db_get_all_faces_with_cluster_names()), not just newly added ones.

Additionally, forcing full reclustering on every folder add and sync (lines 73 and 122) bypasses the 24-hour throttling mechanism in cluster_util_is_reclustering_needed(). While this ensures new photos are immediately reflected in clusters, it reprocesses the entire face database on each operation, which could impact performance for large libraries.

Consider either:

Updating the comment to reflect the actual behavior

Using the standard clustering logic (without force=True) to respect the 24-hour throttle for folder operations, with explicit full reclustering reserved for manual user requests

🤖 Prompt for AI Agents

In backend/app/routes/folders.py around lines 72-73, the inline comment "Force full reclustering for new photos" is misleading because cluster_util_cluster_all_face_embeddings() reclusters all faces, not just new ones, and forcing a full recluster here bypasses the 24-hour throttle; update the comment to accurately state that this triggers a full reclustering of all face embeddings, and remove the force_full_reclustering=True (use the default throttled call) for folder add/sync operations so the 24-hour throttling in cluster_util_is_reclustering_needed() is respected (reserve explicit full reclustering only for manual user requests or admin-triggered paths such as the manual/full-recluster endpoint; also apply the same change for the similar call at line 122).

coderabbitai · 2026-01-02T12:38:04Z

backend/app/routes/folders.py

        image_util_process_folder_images(folder_data)
        image_util_process_untagged_images()
-        cluster_util_face_clusters_sync()
+        cluster_util_face_clusters_sync(force_full_reclustering=True)  # Force full reclustering for synced photos


⚠️ Potential issue | 🟠 Major

Same performance concern applies to sync operations.

This has the same issue as lines 72-73: full reclustering processes all photos in the database, not just newly synced ones. The comment is similarly misleading, and the performance impact could be significant for large libraries during routine sync operations.

🤖 Prompt for AI Agents

In backend/app/routes/folders.py around line 122, the call cluster_util_face_clusters_sync(force_full_reclustering=True) forces a full recluster of the entire DB during every sync which is misleading and will crush performance; change the behavior to only recluster newly synced or modified photos (e.g., pass the list of new photo IDs or a flag to perform incremental reclustering) or make force_full_reclustering=False and schedule an occasional full recluster in a background job; update the call and its comment to reflect that only incremental reclustering runs during normal syncs and full reclustering is done separately on a controlled schedule.

coderabbitai · 2026-01-02T12:38:04Z

backend/app/routes/folders.py

            root_path=request.folder_path,
            parent_folder_id=parent_folder_id,
-            AI_Tagging=False,
+            AI_Tagging=True,  # Enable AI tagging by default for automatic face detection


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /root/repo && cat -n backend/app/routes/folders.py | sed -n '180,200p'

Repository: AOSSIE-Org/PictoPy

Length of output: 122

🏁 Script executed:

cd /root/repo && git log --oneline -20 -- backend/app/routes/folders.py | head -5

Repository: AOSSIE-Org/PictoPy

Length of output: 122

🏁 Script executed:

cd /root/repo && rg "AI_Tagging" -B 3 -A 3 --type py

Repository: AOSSIE-Org/PictoPy

Length of output: 122

🏁 Script executed:

cd /root/repo && fd -t f "README\|CHANGELOG\|CHANGES\|RELEASE" -i | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 122

🏁 Script executed:

cd /root/repo && rg "face.*detect\|automatic.*process" -i --type py | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 122

🌐 Web query:

automatic face detection user consent privacy best practices photo management

💡 Result:

Short best-practices checklist for automatic face-detection/recognition (photo management) and user consent:

Treat facial templates/recognition output as biometric/sensitive data — require explicit, informed consent where law requires it (describe purpose, retention, sharing, rights to withdraw). [1][2][3]

Offer genuine choice and alternatives — don’t make access to core services conditional on consenting to biometrics; provide non‑biometric options. [1][2]

Minimize data collection & retention — collect only needed images/templates, store the minimal derived template (not raw images when feasible), and set short, documented retention periods with regular purge. [1][2]

Purpose limitation & transparency — publish clear notices (in-app, UI) before detection runs; log processing events and notify users of material purpose changes and request fresh consent. [1][3]

Opt-in UX & clear affirmative action — use an explicit, separate opt-in (no pre‑ticked boxes or implied consent); make withdrawal as easy as giving consent. [1][2]

Local/edge processing when possible — perform detection/matching on-device to reduce central storage and breach risk; if server-side, encrypt templates and apply strong access controls. [2][3]

Data protection & security — apply encryption at rest/in transit, access logging, least-privilege, regular audits, and breach-notification procedures. [2][3]

Accuracy, fairness, and testing — evaluate and document model performance across demographic groups; mitigate bias before deployment and allow human review for high‑risk decisions. [2][4]

Allow user control over photos — provide UI to review, export, delete photos/templates and records of detection events; honor data-subject requests promptly. [1][2]

Risk assessment & governance — perform DPIAs/PIAs for deployments, keep records of processing, appoint data protection lead, and limit third‑party sharing; require contractual safeguards with vendors. [1][2][3]

Special contexts caution — avoid unconsented/systematic monitoring of public spaces, and treat employee/workplace use carefully (power imbalance may invalidate consent). [1][2]

If you want, I can: draft UI consent text, a minimal privacy notice, or a retention-policy template that follows these points.

🏁 Script executed:

pwd && git ls-files | grep -E "folders\.py|README|CHANGELOG" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 491

🏁 Script executed:

cat backend/app/routes/folders.py | sed -n '180,200p'

Repository: AOSSIE-Org/PictoPy

Length of output: 1033

🏁 Script executed:

rg "AI_Tagging" -B 3 -A 3 --type py

Repository: AOSSIE-Org/PictoPy

Length of output: 9497

🏁 Script executed:

git log --oneline -p -- backend/app/routes/folders.py | grep -A 10 -B 10 "AI_Tagging" | head -40

Repository: AOSSIE-Org/PictoPy

Length of output: 1703

🏁 Script executed:

rg "add_folder|Folder.*AI" -B 2 -A 2 --type py | head -50

Repository: AOSSIE-Org/PictoPy

Length of output: 3107

🏁 Script executed:

find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" \) | xargs grep -l "AI\|tagging\|face.*detect" 2>/dev/null | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 285

🏁 Script executed:

rg "automatic.*face\|AI.*tag\|face.*detect" -i --type md | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 44

🏁 Script executed:

cat README.md

Repository: AOSSIE-Org/PictoPy

Length of output: 2620

🏁 Script executed:

rg "consent\|privacy\|opt.*in\|opt.*out" -i --type py --type md | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 44

🏁 Script executed:

find . -path ./node_modules -prune -o -type f -name "*.json" -o -name "*.tsx" -o -name "*.ts" | xargs grep -l "AI\|tagging\|consent" 2>/dev/null | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 516

🏁 Script executed:

rg "AddFolder|add_folder" -A 5 --type py | grep -i "request\|response" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 1365

🏁 Script executed:

cat frontend/src/components/FolderPicker/AITaggingFolderPicker.tsx

Repository: AOSSIE-Org/PictoPy

Length of output: 1464

🏁 Script executed:

cat frontend/src/components/EmptyStates/EmptyAITaggingState.tsx

Repository: AOSSIE-Org/PictoPy

Length of output: 1254

🏁 Script executed:

cat backend/app/schemas/folders.py | grep -A 10 "class AddFolderRequest"

Repository: AOSSIE-Org/PictoPy

Length of output: 357

🏁 Script executed:

rg "AddFolderRequest" -B 2 -A 5 frontend/src

Repository: AOSSIE-Org/PictoPy

Length of output: 1071

🏁 Script executed:

cat docs/backend/backend_python/image-processing.md

Repository: AOSSIE-Org/PictoPy

Length of output: 5247

Remove AI_Tagging=True default or implement explicit user consent and control.

Automatic face detection on all newly added folders violates privacy best practices and contradicts the UI, which states "Enable AI Tagging for a folder from the settings page." The API forces AI_Tagging=True with no user control:

AddFolderRequest schema has no AI_Tagging parameter; users cannot opt-out

EmptyAITaggingState.tsx misleadingly suggests users control this via settings

No upfront disclosure or consent prompt when folders are added

Automatic biometric processing (face detection/clustering) without explicit informed consent conflicts with standard privacy requirements

Either:

Change default to AI_Tagging=False and add an explicit opt-in toggle in the folder creation UI, or

Add an ai_tagging parameter to AddFolderRequest so users can control it, and add a clear consent prompt before processing begins.

Document the change in release notes and clarify the UI messaging to reflect actual behavior.

🤖 Prompt for AI Agents

In backend/app/routes/folders.py around line 189, the route is forcing AI_Tagging=True by default which enables automatic face detection without user control; change behavior to respect user consent by either (A) set AI_Tagging default to False and update folder creation flow/UI to include an explicit "Enable AI Tagging" opt-in toggle, or (B) add an explicit ai_tagging boolean field to the AddFolderRequest schema and use that value (default False) when creating folders; also update API validation, TypeScript client types, the EmptyAITaggingState UI text to reflect actual behavior, add a consent prompt before any biometric processing, and include this change in release notes and documentation.

coderabbitai · 2026-01-02T12:38:04Z

backend/app/utils/face_clusters.py

+    # Use lower threshold if merge_threshold not explicitly provided to combine similar clusters
+    effective_merge_threshold = merge_threshold if merge_threshold is not None else 0.60
    results = _merge_similar_clusters(
        results, merge_threshold=effective_merge_threshold
    )


🛠️ Refactor suggestion | 🟠 Major

The merge threshold of 0.60 is very aggressive.

Clusters with only 60% cosine similarity will be merged together. This is quite low and risks combining distinct individuals, especially since it's lower than the primary similarity_threshold of 0.65. Typically, post-clustering merge thresholds should be higher (more strict) than initial clustering thresholds to avoid over-merging.

Consider increasing the default to at least match or exceed similarity_threshold (e.g., 0.70-0.75).

🤖 Prompt for AI Agents

In backend/app/utils/face_clusters.py around lines 309 to 313, the default effective_merge_threshold is set to 0.60 which is too low and risks over-merging clusters; change the logic so that when merge_threshold is None it defaults to a stricter value (either use the existing similarity_threshold variable if available, or set a higher constant such as 0.70–0.75), and update the inline comment to reflect that post-clustering merges must be equal to or stricter than initial clustering similarity to avoid combining distinct individuals.

coderabbitai · 2026-01-02T12:38:04Z

backend/tests/test_face_alignment.py

+def test_align_face_disabled(sample_image, sample_bbox, monkeypatch):
+    """Test that alignment is skipped when disabled."""
+    # Temporarily disable alignment
+    monkeypatch.setattr(settings, 'FACE_ALIGNMENT_ENABLED', False)
+
+    result =align_face_simple(sample_image, sample_bbox, padding=20)
+
+    # Should return simple crop when disabled
+    expected = simple_face_crop(sample_image, sample_bbox, padding=20)
+
+    assert result.shape == expected.shape
+    np.testing.assert_array_equal(result, expected)
+


⚠️ Potential issue | 🟡 Minor

Fix syntax error in function call.

Line 91 is missing a space between the function name and opening parenthesis.

🔎 Proposed fix

- result =align_face_simple(sample_image, sample_bbox, padding=20) + result = align_face_simple(sample_image, sample_bbox, padding=20)

🤖 Prompt for AI Agents

In backend/tests/test_face_alignment.py around lines 86 to 98, the assignment line is missing spacing around the '=' and should be formatted as "result = align_face_simple(...)" — update the line to include a space after the '=' so the function call is clearly separated from the assignment (result = align_face_simple(sample_image, sample_bbox, padding=20)).

290Priyansh added 2 commits January 2, 2026 17:29

Improve face clustering robustness across pose variations

89bef7c

Remove unused files related to face clustering

e9e8f74

coderabbitai bot reviewed Jan 2, 2026

View reviewed changes

290Priyansh mentioned this pull request Jan 3, 2026

Face clustering fails to group same person unless images are near-duplicates #915

Open

		image_util_process_untagged_images()
		cluster_util_face_clusters_sync(force_full_reclustering=True) # Force full reclustering for new photos

Improve Face Clustering Accuracy & Enable Automatic Clustering on Folder Add #914

Are you sure you want to change the base?

Improve Face Clustering Accuracy & Enable Automatic Clustering on Folder Add #914

Conversation

290Priyansh commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

coderabbitai bot commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

290Priyansh commented Jan 2, 2026 •

edited

Loading

coderabbitai bot commented Jan 2, 2026 •

edited

Loading