Skip to content

Conversation

@290Priyansh
Copy link

@290Priyansh 290Priyansh commented Jan 2, 2026

Fixes #915
This PR improves PictoPy’s face clustering pipeline to behave more like a real-world face recognition system.

Key improvements:

  • Automatic face detection & clustering on every folder addition (no manual triggers)
  • AI tagging enabled by default for new folders
  • Tuned clustering thresholds to correctly group the same person across different poses, expressions, and lighting
  • Optional face alignment preprocessing to improve embedding quality
  • Forced reclustering on sync to ensure new images are immediately clustered
pictopy.mp4

Summary by CodeRabbit

  • New Features

    • Face alignment preprocessing now enabled by default to improve face detection accuracy.
    • AI tagging automatically enabled for new folders.
    • Untagged images now processed with full face cluster reclustering for consistency.
    • Face clustering parameters optimized for improved accuracy.
  • Tests

    • Added comprehensive test suite for face alignment utilities.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 2, 2026

📝 Walkthrough

Walkthrough

A new face alignment preprocessing utility is introduced with configuration control, enabling rotation-based eye alignment during face cropping. Face detection is updated to use this alignment utility. Folder processing workflows are enhanced to handle untagged images and perform full reclustering. Face clustering default similarity and merge thresholds are adjusted to calibrate the pipeline.

Changes

Cohort / File(s) Summary
Configuration
backend/app/config/settings.py
Adds FACE_ALIGNMENT_ENABLED global flag (default: True) to control face alignment preprocessing behavior
Face Alignment Utilities
backend/app/utils/face_alignment.py, backend/tests/test_face_alignment.py
New utility module provides estimate_eye_positions(), align_face_simple(), and simple_face_crop() functions for alignment-driven cropping with eye-position-based rotation and fallback handling; comprehensive test coverage for edge cases and both enabled/disabled modes
Face Detection Integration
backend/app/models/FaceDetector.py
Replaces manual face cropping with align_face_simple() utility within detection loop, preserving preprocessing and embedding extraction workflow
Clustering Parameter Tuning
backend/app/utils/face_clusters.py
Adjusts default similarity thresholds in cluster_util_cluster_all_face_embeddings() (eps: 0.750.5, similarity: 0.850.65) and cluster_util_assign_cluster_to_faces_without_clusterId() (similarity: 0.80.65); lowers effective merge threshold to 0.60
Folder Processing Workflows
backend/app/routes/folders.py
Enhances post_folder_add_sequence() and post_sync_folder_sequence() to process untagged images and perform full reclustering (force_full_reclustering=True); enables AI tagging by default on folder addition

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as routes/folders.py
    participant ImgProc as Image Processing
    participant Untagged as Untagged Handler
    participant Cluster as Clustering
    participant Watcher as Sync Watcher
    
    Client->>API: Add/Sync Folder
    activate API
    
    API->>ImgProc: Process Folder Images
    activate ImgProc
    Note over ImgProc: FaceDetector uses<br/>align_face_simple()
    ImgProc-->>API: Images Processed
    deactivate ImgProc
    
    rect rgb(200, 220, 255)
    Note over API,Untagged: New: Handle Untagged Images
    API->>Untagged: Process Untagged Images
    activate Untagged
    Untagged-->>API: Untagged Processed
    deactivate Untagged
    end
    
    rect rgb(200, 220, 255)
    Note over API,Cluster: New: Full Reclustering
    API->>Cluster: Full Reclustering<br/>(force_full_reclustering=True)
    activate Cluster
    Note over Cluster: Uses adjusted<br/>similarity thresholds
    Cluster-->>API: Clustering Complete
    deactivate Cluster
    end
    
    API->>Watcher: Restart Sync Watcher
    activate Watcher
    Watcher-->>API: Watcher Restarted
    deactivate Watcher
    
    API-->>Client: Folder Processing Complete
    deactivate API
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

backend, medium

Suggested reviewers

  • rahulharpal1603

Poem

🐰 Whiskers twitch with joy sublime,
Eyes aligned, now it's prime time!
Rotations smooth, clusters sing,
Face preprocessing takes the wing!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objectives: improving face clustering accuracy through tuned thresholds and enabling automatic clustering on folder addition.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 2, 2026

⚠️ No issue was linked in the PR description.
Please make sure to link an issue (e.g., 'Fixes #issue_number')

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
backend/app/utils/face_alignment.py (2)

17-37: Heuristic eye positions may not work for non-frontal faces.

The function uses fixed proportions (35%/65% horizontal, 35% vertical) to estimate eye positions without actual detection. This approach:

  • Works reasonably for frontal, upright faces
  • Fails for profile views, extreme angles, or unusual poses
  • May misalign faces when heuristics don't match actual eye positions

Since the PR objectives mention handling "different poses, expressions, and lighting conditions," consider whether this heuristic approach is sufficient or if actual eye detection (e.g., using MediaPipe Face Mesh or dlib landmarks) would better meet the goals.


96-98: Consider documenting the 3-degree threshold rationale.

The 3-degree threshold for skipping rotation is reasonable (avoids processing near-horizontal faces), but this magic number could benefit from a named constant or comment explaining why 3 degrees was chosen.

🔎 Suggested improvement
+        # Rotation threshold in degrees - skip rotation for nearly horizontal eyes
+        # to avoid unnecessary processing (3 degrees ≈ 5% vertical deviation)
+        MIN_ROTATION_ANGLE = 3
+        
         # Only apply rotation if angle is significant (> 3 degrees)
-        if abs(angle) < 3:
+        if abs(angle) < MIN_ROTATION_ANGLE:
             return face_crop
backend/app/routes/folders.py (1)

72-73: Add clarifying comment explaining the clustering strategy for AI tagging enablement.

The inconsistency across sequences is intentional: post_folder_add_sequence (line 73) and post_sync_folder_sequence (line 122) force full reclustering for filesystem changes, while post_AI_tagging_enabled_sequence (line 94) uses smart clustering with the 24-hour rule and face count thresholds.

Add an inline comment at line 94 explaining why AI tagging uses smart clustering instead of forced reclustering—this would parallel the existing comments at lines 73 and 122 and match the design pattern throughout the code.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4043751 and e9e8f74.

📒 Files selected for processing (6)
  • backend/app/config/settings.py
  • backend/app/models/FaceDetector.py
  • backend/app/routes/folders.py
  • backend/app/utils/face_alignment.py
  • backend/app/utils/face_clusters.py
  • backend/tests/test_face_alignment.py
🧰 Additional context used
🧬 Code graph analysis (3)
backend/app/models/FaceDetector.py (1)
backend/app/utils/face_alignment.py (1)
  • align_face_simple (40-122)
backend/app/routes/folders.py (2)
backend/app/utils/images.py (1)
  • image_util_process_untagged_images (87-101)
backend/app/utils/face_clusters.py (1)
  • cluster_util_face_clusters_sync (94-162)
backend/tests/test_face_alignment.py (1)
backend/app/utils/face_alignment.py (3)
  • align_face_simple (40-122)
  • simple_face_crop (125-149)
  • estimate_eye_positions (17-37)
🔇 Additional comments (14)
backend/app/utils/face_clusters.py (2)

318-335: Consistent threshold change across functions.

The similarity_threshold reduction from 0.8 to 0.65 is consistent with the changes in cluster_util_cluster_all_face_embeddings. While consistency is good, note the same concern about this being very lenient for face assignment applies here as well.


188-202: I need the original review comment to rewrite. Please provide the review comment content (typically within <review_comment> tags) along with any relevant context about the code being reviewed.

backend/app/config/settings.py (1)

26-28: LGTM! Clean configuration addition.

The FACE_ALIGNMENT_ENABLED flag provides a clear toggle for the face alignment feature. The default value of True aligns with the PR objective of improving face recognition accuracy.

backend/app/models/FaceDetector.py (1)

47-51: Good integration with face alignment utility.

The change cleanly integrates the new align_face_simple utility, replacing manual cropping. The utility includes proper error handling and fallback to simple cropping when alignment fails or is disabled, maintaining robustness.

backend/tests/test_face_alignment.py (8)

16-31: Well-structured test fixtures.

The fixtures provide clear, reusable test data with a sample image containing visual markers (circles for "eyes") that align well with what the alignment utilities expect.


34-48: Good test coverage for simple_face_crop.

The test verifies correct dimensions accounting for padding and validates that the result is a valid 3-channel color image.


50-60: Good boundary condition test.

Testing face crop at image boundaries ensures the function handles edge cases gracefully without crashing.


62-84: Thorough validation of eye position estimation.

The test verifies eye coordinates are reasonable and that the left eye is correctly positioned to the left of the right eye.


100-115: Good test for alignment-enabled behavior.

Verifies that when alignment is enabled, the function returns a valid image with reasonable dimensions.


117-136: Comprehensive edge case testing.

Tests handle tiny images, grayscale inputs, and potential failures gracefully. The flexible assertion (accepting either success or controlled failure) is appropriate for edge cases.


138-161: Good coverage of rotation and padding scenarios.

Tests verify minimal rotation behavior and validate different padding values work correctly.


163-174: Good test for invalid input handling.

Tests that zero-dimension bounding boxes are handled gracefully without crashing.

backend/app/utils/face_alignment.py (2)

40-122: Robust alignment implementation with good fallback logic.

The function:

  • Properly checks FACE_ALIGNMENT_ENABLED flag
  • Handles empty crops gracefully
  • Only rotates when angle exceeds 3 degrees (avoids unnecessary processing)
  • Uses cv2.BORDER_REPLICATE to handle edge pixels during rotation
  • Falls back to simple crop on any exception

The error handling and fallback mechanism ensure the pipeline doesn't break even when alignment fails.


125-149: Clean fallback implementation.

The simple_face_crop function provides a straightforward fallback that properly handles boundary conditions with max() and min() operations.

Comment on lines +47 to +50
# Use face alignment utility for better pose handling
from app.utils.face_alignment import align_face_simple

face_img = align_face_simple(img, bbox, padding=20)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Move the import to module level.

The import statement is currently inside the loop, which means it will be executed for every face detected. This is inefficient and goes against Python best practices.

🔎 Proposed fix

Move the import to the top of the file with other imports:

 from app.models.FaceNet import FaceNet
 from app.utils.FaceNet import FaceNet_util_preprocess_image, FaceNet_util_get_model_path
 from app.utils.YOLO import YOLO_util_get_model_path
 from app.models.YOLO import YOLO
 from app.database.faces import db_insert_face_embeddings_by_image_id
 from app.logging.setup_logging import get_logger
+from app.utils.face_alignment import align_face_simple

Then remove it from the loop:

-                # Use face alignment utility for better pose handling
-                from app.utils.face_alignment import align_face_simple
-                
                 face_img = align_face_simple(img, bbox, padding=20)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Use face alignment utility for better pose handling
from app.utils.face_alignment import align_face_simple
face_img = align_face_simple(img, bbox, padding=20)
face_img = align_face_simple(img, bbox, padding=20)
🤖 Prompt for AI Agents
In backend/app/models/FaceDetector.py around lines 47 to 50, the import of
align_face_simple is inside the loop causing repeated imports; move "from
app.utils.face_alignment import align_face_simple" to the module-level imports
at the top of the file with the other imports, then remove the in-loop import so
the loop only calls align_face_simple(img, bbox, padding=20).

Comment on lines +72 to +73
image_util_process_untagged_images()
cluster_util_face_clusters_sync(force_full_reclustering=True) # Force full reclustering for new photos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for performance-related configuration options or limits for clustering
rg -n -C3 'force_full_reclustering|cluster.*config|clustering.*threshold|max.*faces|performance' --type=py

Repository: AOSSIE-Org/PictoPy

Length of output: 3064


🏁 Script executed:

#!/bin/bash
# Find the implementation of cluster_util_face_clusters_sync to understand actual behavior
rg -n 'def cluster_util_face_clusters_sync|def face_clusters_sync' --type=py -A 15

Repository: AOSSIE-Org/PictoPy

Length of output: 1290


🏁 Script executed:

#!/bin/bash
# Check all usages of force_full_reclustering to understand the pattern
rg -n 'force_full_reclustering' --type=py -B 2 -A 2

Repository: AOSSIE-Org/PictoPy

Length of output: 2316


🏁 Script executed:

#!/bin/bash
# Get the implementation of cluster_util_cluster_all_face_embeddings
rg -n 'def cluster_util_cluster_all_face_embeddings' --type=py -A 20

Repository: AOSSIE-Org/PictoPy

Length of output: 1774


🏁 Script executed:

#!/bin/bash
# Get the implementation of cluster_util_is_reclustering_needed to understand the 24-hour rule
rg -n 'def cluster_util_is_reclustering_needed' --type=py -A 15

Repository: AOSSIE-Org/PictoPy

Length of output: 1210


🏁 Script executed:

#!/bin/bash
# Search for any documentation about performance or the behavior of full reclustering
rg -n 'reclustering|cluster.*all.*faces|cluster_all' --type=py | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 3115


Misleading comment: Full reclustering processes all faces, not just new photos.

The comment "Force full reclustering for new photos" is inaccurate. The cluster_util_cluster_all_face_embeddings() function retrieves and reclusters all faces in the database (via db_get_all_faces_with_cluster_names()), not just newly added ones.

Additionally, forcing full reclustering on every folder add and sync (lines 73 and 122) bypasses the 24-hour throttling mechanism in cluster_util_is_reclustering_needed(). While this ensures new photos are immediately reflected in clusters, it reprocesses the entire face database on each operation, which could impact performance for large libraries.

Consider either:

  • Updating the comment to reflect the actual behavior
  • Using the standard clustering logic (without force=True) to respect the 24-hour throttle for folder operations, with explicit full reclustering reserved for manual user requests
🤖 Prompt for AI Agents
In backend/app/routes/folders.py around lines 72-73, the inline comment "Force
full reclustering for new photos" is misleading because
cluster_util_cluster_all_face_embeddings() reclusters all faces, not just new
ones, and forcing a full recluster here bypasses the 24-hour throttle; update
the comment to accurately state that this triggers a full reclustering of all
face embeddings, and remove the force_full_reclustering=True (use the default
throttled call) for folder add/sync operations so the 24-hour throttling in
cluster_util_is_reclustering_needed() is respected (reserve explicit full
reclustering only for manual user requests or admin-triggered paths such as the
manual/full-recluster endpoint; also apply the same change for the similar call
at line 122).

image_util_process_folder_images(folder_data)
image_util_process_untagged_images()
cluster_util_face_clusters_sync()
cluster_util_face_clusters_sync(force_full_reclustering=True) # Force full reclustering for synced photos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same performance concern applies to sync operations.

This has the same issue as lines 72-73: full reclustering processes all photos in the database, not just newly synced ones. The comment is similarly misleading, and the performance impact could be significant for large libraries during routine sync operations.

🤖 Prompt for AI Agents
In backend/app/routes/folders.py around line 122, the call
cluster_util_face_clusters_sync(force_full_reclustering=True) forces a full
recluster of the entire DB during every sync which is misleading and will crush
performance; change the behavior to only recluster newly synced or modified
photos (e.g., pass the list of new photo IDs or a flag to perform incremental
reclustering) or make force_full_reclustering=False and schedule an occasional
full recluster in a background job; update the call and its comment to reflect
that only incremental reclustering runs during normal syncs and full
reclustering is done separately on a controlled schedule.

root_path=request.folder_path,
parent_folder_id=parent_folder_id,
AI_Tagging=False,
AI_Tagging=True, # Enable AI tagging by default for automatic face detection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd /root/repo && cat -n backend/app/routes/folders.py | sed -n '180,200p'

Repository: AOSSIE-Org/PictoPy

Length of output: 122


🏁 Script executed:

cd /root/repo && git log --oneline -20 -- backend/app/routes/folders.py | head -5

Repository: AOSSIE-Org/PictoPy

Length of output: 122


🏁 Script executed:

cd /root/repo && rg "AI_Tagging" -B 3 -A 3 --type py

Repository: AOSSIE-Org/PictoPy

Length of output: 122


🏁 Script executed:

cd /root/repo && fd -t f "README\|CHANGELOG\|CHANGES\|RELEASE" -i | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 122


🏁 Script executed:

cd /root/repo && rg "face.*detect\|automatic.*process" -i --type py | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 122


🌐 Web query:

automatic face detection user consent privacy best practices photo management

💡 Result:

Short best-practices checklist for automatic face-detection/recognition (photo management) and user consent:

  1. Treat facial templates/recognition output as biometric/sensitive data — require explicit, informed consent where law requires it (describe purpose, retention, sharing, rights to withdraw). [1][2][3]

  2. Offer genuine choice and alternatives — don’t make access to core services conditional on consenting to biometrics; provide non‑biometric options. [1][2]

  3. Minimize data collection & retention — collect only needed images/templates, store the minimal derived template (not raw images when feasible), and set short, documented retention periods with regular purge. [1][2]

  4. Purpose limitation & transparency — publish clear notices (in-app, UI) before detection runs; log processing events and notify users of material purpose changes and request fresh consent. [1][3]

  5. Opt-in UX & clear affirmative action — use an explicit, separate opt-in (no pre‑ticked boxes or implied consent); make withdrawal as easy as giving consent. [1][2]

  6. Local/edge processing when possible — perform detection/matching on-device to reduce central storage and breach risk; if server-side, encrypt templates and apply strong access controls. [2][3]

  7. Data protection & security — apply encryption at rest/in transit, access logging, least-privilege, regular audits, and breach-notification procedures. [2][3]

  8. Accuracy, fairness, and testing — evaluate and document model performance across demographic groups; mitigate bias before deployment and allow human review for high‑risk decisions. [2][4]

  9. Allow user control over photos — provide UI to review, export, delete photos/templates and records of detection events; honor data-subject requests promptly. [1][2]

  10. Risk assessment & governance — perform DPIAs/PIAs for deployments, keep records of processing, appoint data protection lead, and limit third‑party sharing; require contractual safeguards with vendors. [1][2][3]

  11. Special contexts caution — avoid unconsented/systematic monitoring of public spaces, and treat employee/workplace use carefully (power imbalance may invalidate consent). [1][2]

If you want, I can: draft UI consent text, a minimal privacy notice, or a retention-policy template that follows these points.


🏁 Script executed:

pwd && git ls-files | grep -E "folders\.py|README|CHANGELOG" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 491


🏁 Script executed:

cat backend/app/routes/folders.py | sed -n '180,200p'

Repository: AOSSIE-Org/PictoPy

Length of output: 1033


🏁 Script executed:

rg "AI_Tagging" -B 3 -A 3 --type py

Repository: AOSSIE-Org/PictoPy

Length of output: 9497


🏁 Script executed:

git log --oneline -p -- backend/app/routes/folders.py | grep -A 10 -B 10 "AI_Tagging" | head -40

Repository: AOSSIE-Org/PictoPy

Length of output: 1703


🏁 Script executed:

rg "add_folder|Folder.*AI" -B 2 -A 2 --type py | head -50

Repository: AOSSIE-Org/PictoPy

Length of output: 3107


🏁 Script executed:

find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" \) | xargs grep -l "AI\|tagging\|face.*detect" 2>/dev/null | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 285


🏁 Script executed:

rg "automatic.*face\|AI.*tag\|face.*detect" -i --type md | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 44


🏁 Script executed:

cat README.md

Repository: AOSSIE-Org/PictoPy

Length of output: 2620


🏁 Script executed:

rg "consent\|privacy\|opt.*in\|opt.*out" -i --type py --type md | head -30

Repository: AOSSIE-Org/PictoPy

Length of output: 44


🏁 Script executed:

find . -path ./node_modules -prune -o -type f -name "*.json" -o -name "*.tsx" -o -name "*.ts" | xargs grep -l "AI\|tagging\|consent" 2>/dev/null | head -10

Repository: AOSSIE-Org/PictoPy

Length of output: 516


🏁 Script executed:

rg "AddFolder|add_folder" -A 5 --type py | grep -i "request\|response" | head -20

Repository: AOSSIE-Org/PictoPy

Length of output: 1365


🏁 Script executed:

cat frontend/src/components/FolderPicker/AITaggingFolderPicker.tsx

Repository: AOSSIE-Org/PictoPy

Length of output: 1464


🏁 Script executed:

cat frontend/src/components/EmptyStates/EmptyAITaggingState.tsx

Repository: AOSSIE-Org/PictoPy

Length of output: 1254


🏁 Script executed:

cat backend/app/schemas/folders.py | grep -A 10 "class AddFolderRequest"

Repository: AOSSIE-Org/PictoPy

Length of output: 357


🏁 Script executed:

rg "AddFolderRequest" -B 2 -A 5 frontend/src

Repository: AOSSIE-Org/PictoPy

Length of output: 1071


🏁 Script executed:

cat docs/backend/backend_python/image-processing.md

Repository: AOSSIE-Org/PictoPy

Length of output: 5247


Remove AI_Tagging=True default or implement explicit user consent and control.

Automatic face detection on all newly added folders violates privacy best practices and contradicts the UI, which states "Enable AI Tagging for a folder from the settings page." The API forces AI_Tagging=True with no user control:

  • AddFolderRequest schema has no AI_Tagging parameter; users cannot opt-out
  • EmptyAITaggingState.tsx misleadingly suggests users control this via settings
  • No upfront disclosure or consent prompt when folders are added
  • Automatic biometric processing (face detection/clustering) without explicit informed consent conflicts with standard privacy requirements

Either:

  1. Change default to AI_Tagging=False and add an explicit opt-in toggle in the folder creation UI, or
  2. Add an ai_tagging parameter to AddFolderRequest so users can control it, and add a clear consent prompt before processing begins.

Document the change in release notes and clarify the UI messaging to reflect actual behavior.

🤖 Prompt for AI Agents
In backend/app/routes/folders.py around line 189, the route is forcing
AI_Tagging=True by default which enables automatic face detection without user
control; change behavior to respect user consent by either (A) set AI_Tagging
default to False and update folder creation flow/UI to include an explicit
"Enable AI Tagging" opt-in toggle, or (B) add an explicit ai_tagging boolean
field to the AddFolderRequest schema and use that value (default False) when
creating folders; also update API validation, TypeScript client types, the
EmptyAITaggingState UI text to reflect actual behavior, add a consent prompt
before any biometric processing, and include this change in release notes and
documentation.

Comment on lines +309 to 313
# Use lower threshold if merge_threshold not explicitly provided to combine similar clusters
effective_merge_threshold = merge_threshold if merge_threshold is not None else 0.60
results = _merge_similar_clusters(
results, merge_threshold=effective_merge_threshold
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

The merge threshold of 0.60 is very aggressive.

Clusters with only 60% cosine similarity will be merged together. This is quite low and risks combining distinct individuals, especially since it's lower than the primary similarity_threshold of 0.65. Typically, post-clustering merge thresholds should be higher (more strict) than initial clustering thresholds to avoid over-merging.

Consider increasing the default to at least match or exceed similarity_threshold (e.g., 0.70-0.75).

🤖 Prompt for AI Agents
In backend/app/utils/face_clusters.py around lines 309 to 313, the default
effective_merge_threshold is set to 0.60 which is too low and risks over-merging
clusters; change the logic so that when merge_threshold is None it defaults to a
stricter value (either use the existing similarity_threshold variable if
available, or set a higher constant such as 0.70–0.75), and update the inline
comment to reflect that post-clustering merges must be equal to or stricter than
initial clustering similarity to avoid combining distinct individuals.

Comment on lines +86 to +98
def test_align_face_disabled(sample_image, sample_bbox, monkeypatch):
"""Test that alignment is skipped when disabled."""
# Temporarily disable alignment
monkeypatch.setattr(settings, 'FACE_ALIGNMENT_ENABLED', False)

result =align_face_simple(sample_image, sample_bbox, padding=20)

# Should return simple crop when disabled
expected = simple_face_crop(sample_image, sample_bbox, padding=20)

assert result.shape == expected.shape
np.testing.assert_array_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix syntax error in function call.

Line 91 is missing a space between the function name and opening parenthesis.

🔎 Proposed fix
-    result =align_face_simple(sample_image, sample_bbox, padding=20)
+    result = align_face_simple(sample_image, sample_bbox, padding=20)
🤖 Prompt for AI Agents
In backend/tests/test_face_alignment.py around lines 86 to 98, the assignment
line is missing spacing around the '=' and should be formatted as "result =
align_face_simple(...)" — update the line to include a space after the '=' so
the function call is clearly separated from the assignment (result =
align_face_simple(sample_image, sample_bbox, padding=20)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Face clustering fails to group same person unless images are near-duplicates

1 participant