feat: display speech sample transcripts in People settings #4322

beastoin · 2026-01-21T03:35:39Z

Previously, any speaker sample audio could be stored without verification, relying solely on the accuracy of the private cloud sync and the segment-level timestamps. This could degrade speaker identification accuracy, especially when the latency of the STT service is high.

Now samples must pass 3 checks: minimum 5 words transcribed, single speaker dominance ≥70% via diarization (catches crosstalk that segment boundaries miss), and ≥60% text similarity using character trigram Jaccard (chosen for being language-agnostic across CJK/Cyrillic/Arabic without tokenizers or NLP deps, and robust to minor transcription variations).

This pr was drafted by AI on behalf of @beastoin

Add technical implementation plan (PRD.MD) and progress tracking checklist (progress.txt) for the speech sample transcripts feature. This feature will display transcripts of speech samples in the Settings > People page, leveraging the existing Deepgram transcription from speaker sample verification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a Product Requirements Document (PRD.MD) and an implementation checklist (progress.txt) for displaying speech sample transcripts in the People Settings. The PRD details data model changes, backend logic for managing speech samples and transcripts, and frontend UI updates. The plan outlines a lazy migration strategy for existing v1 samples. My primary feedback concerns a critical data integrity risk in the proposed add_person_speech_sample function, where the use of firestore.ArrayUnion with parallel arrays could lead to desynchronization. A more robust atomic update mechanism is recommended to ensure consistency.

PRD.MD

Create backend/utils/speaker_sample_migration.py with: - verify_and_transcribe_sample(): Transcribe audio and verify quality - migrate_person_samples_v1_to_v2(): Migrate samples from v1 to v2 format - download_sample_audio(): Download speech sample from GCS - delete_sample_from_storage(): Delete speech sample from GCS This centralizes the verification logic from speaker_identification.py for reuse in lazy migration. Part of speech sample transcripts feature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Created backend/utils/speaker_sample_migration.py with all four required functions: verify_and_transcribe_sample, migrate_person_samples_v1_to_v2, download_sample_audio, and delete_sample_from_storage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…son model Add new fields to support storing transcripts alongside speech samples: - speech_sample_transcripts: Optional[List[str]] for parallel transcript array - speech_samples_version: int defaulting to 1 for migration tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Update add_person_speech_sample() to accept transcript parameter and store it in parallel array. Update remove_person_speech_sample() to remove by index to keep samples and transcripts arrays in sync. Add new functions for migration support: - set_person_speech_sample_transcript() - update_person_speech_samples_after_migration() - clear_person_speaker_embedding() - update_person_speech_samples_version() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace inline _verify_sample_quality function with the centralized verify_and_transcribe_sample from speaker_sample_migration module. Now passes transcript to add_person_speech_sample() to store transcripts alongside speech samples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Make get_all_people() and get_single_person() async to support lazy migration of v1 speech samples to v2 with transcripts when fetching people data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…n model Update fromJson() and toJson() methods to properly parse/serialize speech_sample_transcripts and speech_samples_version fields that were already defined but not being serialized. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Show transcript text in italic below each speech sample in the Settings > People page. Handles null/missing transcripts gracefully by only displaying when available. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

All existing backend unit tests pass: - 22 transcript_segment tests - 27 text_similarity tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Flutter tests fail due to missing provider/path_provider dependencies in the test environment. Verified same failures occur on main branch, confirming this is not caused by PR changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin · 2026-01-22T04:18:13Z

Required fixes before merge:

Remove migration from GET; move v1→v2 migration into speaker extraction when adding a sample (version==1).
Add a simple in‑process lock per uid/person_id to avoid double migration on the same instance.
Fix potential array drift in add_person_speech_sample: ArrayUnion + transcript append from a stale snapshot can misalign; use a Firestore transaction or serialize updates.

Optional: drop PRD.MD/progress.txt and remove the unused import in speaker_sample_migration.py.

Ping me when updated.

This comment was drafted by AI on behalf of @beastoin

… transaction Per beastoin's review on PR #4322: - Move v1→v2 migration from GET endpoints to speaker extraction flow - Add in-process asyncio lock per uid/person_id to prevent double migration - Use Firestore transaction in add_person_speech_sample for atomic array updates - Remove unused google.cloud.storage import - Delete PRD.MD and progress.txt files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin · 2026-01-22T04:38:01Z

All three fixes implemented in latest commit:

Moved v1→v2 migration from GET endpoints to extract_speaker_samples() - triggers when adding sample to v1 person
Added asyncio lock dictionary in speaker_sample_migration.py with per uid/person_id locking and double-check inside lock
Refactored add_person_speech_sample to use Firestore transaction for atomic read-modify-write
Removed PRD.MD, progress.txt, and unused storage import

Ready for re-review.

This comment was drafted by AI on behalf of @beastoin

backend/database/users.py

beastoin · 2026-01-22T04:57:00Z

@beastoin Looks good overall after the fixes—only one thing left: in extract_speaker_samples the migration runs after the early return when sample_count >= max_samples, so v1 users at the limit never migrate; move migration before that guard and re-check count afterward. Can you adjust that and ping me for a final check?

This comment was drafted by AI on behalf of @beastoin

Move migration to run before the early return guard so v1 users at the sample limit still get migrated. Migration may drop invalid samples, freeing up space for new ones. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin · 2026-01-22T04:58:50Z

Fixed—migration now runs before the sample count check. V1 users at the limit will get migrated (potentially freeing space if invalid samples are dropped), then count is re-checked.

Ready for final review.

This comment was drafted by AI on behalf of @beastoin

beastoin · 2026-01-22T05:07:45Z

@beastoin Quick optional cleanup for a follow‑up PR: split backend/utils/speaker_sample_migration.py into two modules for clarity. Put verification + GCS helpers in backend/utils/speaker_sample.py (verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage) and keep migration + locking in backend/utils/speaker_sample_migration.py. Not required for this merge, but it’ll make reuse cleaner.

This comment was drafted by AI on behalf of @beastoin

Per reviewer feedback, extract verification + GCS helpers into speaker_sample.py for cleaner reuse: - speaker_sample.py: verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage - speaker_sample_migration.py: migration logic + locking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin · 2026-01-22T05:12:06Z

Done—split into two modules:

speaker_sample.py: verification + GCS helpers (verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage)
speaker_sample_migration.py: migration logic + locking

Ready for final review.

This comment was drafted by AI on behalf of @beastoin

beastoin · 2026-01-22T05:20:24Z

@beastoin Optional cleanup request: move all storage_client + speech_profiles_bucket usage into utils/other/storage.py. Add two generic helpers there: download_blob_bytes(bucket, path) and delete_blob(bucket, path). Then add speech-profile wrappers there: download_speech_profile_bytes(path) and delete_speech_profile_blob(path). Update backend/utils/speaker_sample.py to call only the speech-profile helpers (no direct bucket/client usage).

This comment was drafted by AI on behalf of @beastoin

Per reviewer feedback: - Add generic helpers: download_blob_bytes, delete_blob - Add speech-profile wrappers: download_speech_profile_bytes, delete_speech_profile_blob - Update speaker_sample.py to use only the wrappers (no direct bucket/client usage) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin · 2026-01-22T05:22:40Z

Done—moved GCS helpers to utils/other/storage.py:

Generic: download_blob_bytes(bucket, path), delete_blob(bucket, path)
Speech-profile wrappers: download_speech_profile_bytes(path), delete_speech_profile_blob(path)

speaker_sample.py now uses only the speech-profile wrappers with no direct bucket/client usage.

Ready for final review.

This comment was drafted by AI on behalf of @beastoin

… transaction Per beastoin's review on PR #4322: - Move v1→v2 migration from GET endpoints to speaker extraction flow - Add in-process asyncio lock per uid/person_id to prevent double migration - Use Firestore transaction in add_person_speech_sample for atomic array updates - Remove unused google.cloud.storage import - Delete PRD.MD and progress.txt files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fixes #4253. Adds transcript capture/storage and People settings UI for speech samples (from #4322), and enforces stricter verification before saving samples (min 5 words, ≥70% single-speaker dominance via diarization, and ≥60% trigram Jaccard similarity) to avoid low‑quality or mixed‑speaker data. **deploy steps** - [ ] deploy backend(s) --- _This pr was drafted by AI on behalf of @beastoin_

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

PRD.MD Outdated Show resolved Hide resolved

beastoin marked this pull request as draft January 21, 2026 03:37

beastoin and others added 10 commits January 21, 2026 04:42

feat: add lazy migration to people API endpoints

03a4a96

Make get_all_people() and get_single_person() async to support lazy migration of v1 speech samples to v2 with transcripts when fetching people data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

feat: display speech sample transcripts in People settings UI

de49811

Show transcript text in italic below each speech sample in the Settings > People page. Handles null/missing transcripts gracefully by only displaying when available. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

test: verify backend tests pass (49 tests)

f37207c

All existing backend unit tests pass: - 22 transcript_segment tests - 27 text_similarity tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

beastoin changed the title ~~docs: PRD and checklist for speech sample transcripts~~ feat: display speech sample transcripts in People settings Jan 21, 2026

beastoin commented Jan 22, 2026

View reviewed changes

backend/database/users.py Show resolved Hide resolved

beastoin marked this pull request as ready for review January 22, 2026 07:40

beastoin merged commit 0077ba6 into e8w2h_speaker_identification Jan 22, 2026
1 check passed

beastoin deleted the speech-sample-transcripts branch January 22, 2026 07:40

beastoin mentioned this pull request Jan 22, 2026

add speaker sample quality verification before storage #4291

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: display speech sample transcripts in People settings #4322

feat: display speech sample transcripts in People settings #4322

beastoin commented Jan 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026 •

edited

Loading

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026 •

edited

Loading

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026 •

edited

Loading

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: display speech sample transcripts in People settings #4322

feat: display speech sample transcripts in People settings #4322

Conversation

beastoin commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

beastoin commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

beastoin commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beastoin commented Jan 21, 2026 •

edited

Loading

beastoin commented Jan 22, 2026 •

edited

Loading

beastoin commented Jan 22, 2026 •

edited

Loading

beastoin commented Jan 22, 2026 •

edited

Loading