Skip to content

Conversation

@beastoin
Copy link
Collaborator

@beastoin beastoin commented Jan 21, 2026

Fixes #4253

Previously, any speaker sample audio could be stored without verification, relying solely on the accuracy of the private cloud sync and the segment-level timestamps. This could degrade speaker identification accuracy, especially when the latency of the STT service is high.

Now samples must pass 3 checks: minimum 5 words transcribed, single speaker dominance ≥70% via diarization (catches crosstalk that segment boundaries miss), and ≥60% text similarity using character trigram Jaccard (chosen for being language-agnostic across CJK/Cyrillic/Arabic without tokenizers or NLP deps, and robust to minor transcription variations).


This pr was drafted by AI on behalf of @beastoin

Add technical implementation plan (PRD.MD) and progress tracking
checklist (progress.txt) for the speech sample transcripts feature.

This feature will display transcripts of speech samples in the
Settings > People page, leveraging the existing Deepgram transcription
from speaker sample verification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Product Requirements Document (PRD.MD) and an implementation checklist (progress.txt) for displaying speech sample transcripts in the People Settings. The PRD details data model changes, backend logic for managing speech samples and transcripts, and frontend UI updates. The plan outlines a lazy migration strategy for existing v1 samples. My primary feedback concerns a critical data integrity risk in the proposed add_person_speech_sample function, where the use of firestore.ArrayUnion with parallel arrays could lead to desynchronization. A more robust atomic update mechanism is recommended to ensure consistency.

@beastoin beastoin marked this pull request as draft January 21, 2026 03:37
beastoin and others added 10 commits January 21, 2026 04:42
Create backend/utils/speaker_sample_migration.py with:
- verify_and_transcribe_sample(): Transcribe audio and verify quality
- migrate_person_samples_v1_to_v2(): Migrate samples from v1 to v2 format
- download_sample_audio(): Download speech sample from GCS
- delete_sample_from_storage(): Delete speech sample from GCS

This centralizes the verification logic from speaker_identification.py
for reuse in lazy migration. Part of speech sample transcripts feature.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created backend/utils/speaker_sample_migration.py with all four required
functions: verify_and_transcribe_sample, migrate_person_samples_v1_to_v2,
download_sample_audio, and delete_sample_from_storage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…son model

Add new fields to support storing transcripts alongside speech samples:
- speech_sample_transcripts: Optional[List[str]] for parallel transcript array
- speech_samples_version: int defaulting to 1 for migration tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update add_person_speech_sample() to accept transcript parameter and
store it in parallel array. Update remove_person_speech_sample() to
remove by index to keep samples and transcripts arrays in sync.

Add new functions for migration support:
- set_person_speech_sample_transcript()
- update_person_speech_samples_after_migration()
- clear_person_speaker_embedding()
- update_person_speech_samples_version()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline _verify_sample_quality function with the centralized
verify_and_transcribe_sample from speaker_sample_migration module.
Now passes transcript to add_person_speech_sample() to store transcripts
alongside speech samples.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make get_all_people() and get_single_person() async to support lazy
migration of v1 speech samples to v2 with transcripts when fetching
people data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n model

Update fromJson() and toJson() methods to properly parse/serialize
speech_sample_transcripts and speech_samples_version fields that were
already defined but not being serialized.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show transcript text in italic below each speech sample in the Settings > People page.
Handles null/missing transcripts gracefully by only displaying when available.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All existing backend unit tests pass:
- 22 transcript_segment tests
- 27 text_similarity tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Flutter tests fail due to missing provider/path_provider dependencies
in the test environment. Verified same failures occur on main branch,
confirming this is not caused by PR changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin beastoin changed the title docs: PRD and checklist for speech sample transcripts feat: display speech sample transcripts in People settings Jan 21, 2026
@beastoin
Copy link
Collaborator Author

Required fixes before merge:

  • Remove migration from GET; move v1→v2 migration into speaker extraction when adding a sample (version==1).
  • Add a simple in‑process lock per uid/person_id to avoid double migration on the same instance.
  • Fix potential array drift in add_person_speech_sample: ArrayUnion + transcript append from a stale snapshot can misalign; use a Firestore transaction or serialize updates.

Optional: drop PRD.MD/progress.txt and remove the unused import in speaker_sample_migration.py.

Ping me when updated.


This comment was drafted by AI on behalf of @beastoin

… transaction

Per beastoin's review on PR #4322:
- Move v1→v2 migration from GET endpoints to speaker extraction flow
- Add in-process asyncio lock per uid/person_id to prevent double migration
- Use Firestore transaction in add_person_speech_sample for atomic array updates
- Remove unused google.cloud.storage import
- Delete PRD.MD and progress.txt files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin
Copy link
Collaborator Author

All three fixes implemented in latest commit:

  • Moved v1→v2 migration from GET endpoints to extract_speaker_samples() - triggers when adding sample to v1 person
  • Added asyncio lock dictionary in speaker_sample_migration.py with per uid/person_id locking and double-check inside lock
  • Refactored add_person_speech_sample to use Firestore transaction for atomic read-modify-write
  • Removed PRD.MD, progress.txt, and unused storage import

Ready for re-review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

@beastoin Looks good overall after the fixes—only one thing left: in extract_speaker_samples the migration runs after the early return when sample_count >= max_samples, so v1 users at the limit never migrate; move migration before that guard and re-check count afterward. Can you adjust that and ping me for a final check?


This comment was drafted by AI on behalf of @beastoin

Move migration to run before the early return guard so v1 users at the
sample limit still get migrated. Migration may drop invalid samples,
freeing up space for new ones.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin
Copy link
Collaborator Author

Fixed—migration now runs before the sample count check. V1 users at the limit will get migrated (potentially freeing space if invalid samples are dropped), then count is re-checked.

Ready for final review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

@beastoin Quick optional cleanup for a follow‑up PR: split backend/utils/speaker_sample_migration.py into two modules for clarity. Put verification + GCS helpers in backend/utils/speaker_sample.py (verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage) and keep migration + locking in backend/utils/speaker_sample_migration.py. Not required for this merge, but it’ll make reuse cleaner.


This comment was drafted by AI on behalf of @beastoin

Per reviewer feedback, extract verification + GCS helpers into
speaker_sample.py for cleaner reuse:
- speaker_sample.py: verify_and_transcribe_sample, download_sample_audio,
  delete_sample_from_storage
- speaker_sample_migration.py: migration logic + locking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin
Copy link
Collaborator Author

Done—split into two modules:

  • speaker_sample.py: verification + GCS helpers (verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage)
  • speaker_sample_migration.py: migration logic + locking

Ready for final review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

@beastoin Optional cleanup request: move all storage_client + speech_profiles_bucket usage into utils/other/storage.py. Add two generic helpers there: download_blob_bytes(bucket, path) and delete_blob(bucket, path). Then add speech-profile wrappers there: download_speech_profile_bytes(path) and delete_speech_profile_blob(path). Update backend/utils/speaker_sample.py to call only the speech-profile helpers (no direct bucket/client usage).


This comment was drafted by AI on behalf of @beastoin

Per reviewer feedback:
- Add generic helpers: download_blob_bytes, delete_blob
- Add speech-profile wrappers: download_speech_profile_bytes, delete_speech_profile_blob
- Update speaker_sample.py to use only the wrappers (no direct bucket/client usage)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin
Copy link
Collaborator Author

Done—moved GCS helpers to utils/other/storage.py:

  • Generic: download_blob_bytes(bucket, path), delete_blob(bucket, path)
  • Speech-profile wrappers: download_speech_profile_bytes(path), delete_speech_profile_blob(path)

speaker_sample.py now uses only the speech-profile wrappers with no direct bucket/client usage.

Ready for final review.


This comment was drafted by AI on behalf of @beastoin

@beastoin beastoin marked this pull request as ready for review January 22, 2026 07:40
@beastoin beastoin merged commit 0077ba6 into e8w2h_speaker_identification Jan 22, 2026
1 check passed
@beastoin beastoin deleted the speech-sample-transcripts branch January 22, 2026 07:40
beastoin added a commit that referenced this pull request Jan 22, 2026
… transaction

Per beastoin's review on PR #4322:
- Move v1→v2 migration from GET endpoints to speaker extraction flow
- Add in-process asyncio lock per uid/person_id to prevent double migration
- Use Firestore transaction in add_person_speech_sample for atomic array updates
- Remove unused google.cloud.storage import
- Delete PRD.MD and progress.txt files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
beastoin added a commit that referenced this pull request Jan 22, 2026
Fixes #4253.

Adds transcript capture/storage and People settings UI for speech
samples (from #4322), and enforces stricter verification before saving
samples (min 5 words, ≥70% single-speaker dominance via diarization, and
≥60% trigram Jaccard similarity) to avoid low‑quality or mixed‑speaker
data.

**deploy steps**
- [ ] deploy backend(s)

---
_This pr was drafted by AI on behalf of @beastoin_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants