Add pipeline summary metrics to face clustering logs

## Summary

The face clustering pipeline in `backend/app/utils/face_clusters.py` performs several critical filtering and decision steps, but lacks high-level summary metrics that would help developers understand clustering outcomes during debugging, testing, and development.

This issue proposes adding lightweight summary logging at key decision boundaries to improve observability without changing any clustering behavior.

---

## Current State: What Already Exists

The clustering pipeline already has good logging for individual operations:

 **Line 239**: `"Total valid faces to cluster: X"`  
**Line 233**: `"Filtered out X invalid embeddings"`  
**Line 258**: `"Applied similarity threshold: X"`  
**Line 271**: `"DBSCAN found X clusters"`  
**Line 498**: Individual cluster merge notifications  

These logs provide step-by-step visibility into the pipeline's execution.

---

## What's Missing: Summary Metrics

While individual steps are logged, **aggregate outcomes** are not. This makes it difficult to answer:

- "How many faces were excluded as DBSCAN noise?"
- "Did most faces get clustered or rejected?"
- "How much did post-merge clustering reduce the cluster count?"
- "Did the pipeline behave as expected end-to-end?"

**Specifically missing:**

❌ **DBSCAN noise count** - Number of faces marked as outliers (label == -1)  
❌ **Pre/post-merge comparison** - Cluster count before and after similarity merging  
❌ **Final pipeline summary** - Total faces processed and final cluster count  

---

## Proposed Additions

Add three high-level summary logs:

### 1. After DBSCAN clustering (~line 280)
```python
noise_count = sum(1 for label in cluster_labels if label == -1)
logger.info(f"DBSCAN results: {len(set(cluster_labels)) - 1} clusters, {noise_count} noise faces")
```
### 2. Post-merge summary (~line 313)
```python
pre_merge_count = len(set(r.cluster_uuid for r in results))  # Before merge
 ... after merge ...
post_merge_count = len(set(r.cluster_uuid for r in results))
logger.info(f"Post-merge: {pre_merge_count} → {post_merge_count} clusters")
```

### 3. Final pipeline summary (end of function)
```python
logger.info(f"Clustering complete: {len(results)} faces assigned across {post_merge_count} clusters")
```

## Expected Output Example
[INFO] Total valid faces to cluster: 412
[INFO] Filtered out 27 invalid embeddings
[INFO] Applied similarity threshold: 0.85 (max_distance: 0.150)
[INFO] DBSCAN results: 38 clusters, 91 noise faces
[INFO] Post-merge: 38 → 31 clusters
[INFO] Clustering complete: 321 faces assigned across 31 clusters


## Why This Matters
This is not an ML accuracy issue. The clustering logic may be working perfectly—but without summary diagnostics:

Developers cannot easily verify expected pipeline behavior
Debugging requires breakpoints or manual print statements
Clustering outcomes appear "opaque" when investigating issues
New contributors have difficulty understanding system behavior

## Open Questions
Before implementing, I wanted to confirm:

Would this level of clustering observability be useful?
Are there other metrics that would be valuable to surface?
Any preferences on log level (INFO vs DEBUG)?
Happy to adjust based on maintainer feedback.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pipeline summary metrics to face clustering logs #878

Summary

Current State: What Already Exists

What's Missing: Summary Metrics

Proposed Additions

1. After DBSCAN clustering (~line 280)

2. Post-merge summary (~line 313)

3. Final pipeline summary (end of function)

Expected Output Example

Why This Matters

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add pipeline summary metrics to face clustering logs #878

Description

Summary

Current State: What Already Exists

What's Missing: Summary Metrics

Proposed Additions

1. After DBSCAN clustering (~line 280)

2. Post-merge summary (~line 313)

3. Final pipeline summary (end of function)

Expected Output Example

Why This Matters

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions