-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat: display speech sample transcripts in People settings #4322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: display speech sample transcripts in People settings #4322
Conversation
Add technical implementation plan (PRD.MD) and progress tracking checklist (progress.txt) for the speech sample transcripts feature. This feature will display transcripts of speech samples in the Settings > People page, leveraging the existing Deepgram transcription from speaker sample verification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a Product Requirements Document (PRD.MD) and an implementation checklist (progress.txt) for displaying speech sample transcripts in the People Settings. The PRD details data model changes, backend logic for managing speech samples and transcripts, and frontend UI updates. The plan outlines a lazy migration strategy for existing v1 samples. My primary feedback concerns a critical data integrity risk in the proposed add_person_speech_sample function, where the use of firestore.ArrayUnion with parallel arrays could lead to desynchronization. A more robust atomic update mechanism is recommended to ensure consistency.
Create backend/utils/speaker_sample_migration.py with: - verify_and_transcribe_sample(): Transcribe audio and verify quality - migrate_person_samples_v1_to_v2(): Migrate samples from v1 to v2 format - download_sample_audio(): Download speech sample from GCS - delete_sample_from_storage(): Delete speech sample from GCS This centralizes the verification logic from speaker_identification.py for reuse in lazy migration. Part of speech sample transcripts feature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created backend/utils/speaker_sample_migration.py with all four required functions: verify_and_transcribe_sample, migrate_person_samples_v1_to_v2, download_sample_audio, and delete_sample_from_storage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…son model Add new fields to support storing transcripts alongside speech samples: - speech_sample_transcripts: Optional[List[str]] for parallel transcript array - speech_samples_version: int defaulting to 1 for migration tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update add_person_speech_sample() to accept transcript parameter and store it in parallel array. Update remove_person_speech_sample() to remove by index to keep samples and transcripts arrays in sync. Add new functions for migration support: - set_person_speech_sample_transcript() - update_person_speech_samples_after_migration() - clear_person_speaker_embedding() - update_person_speech_samples_version() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline _verify_sample_quality function with the centralized verify_and_transcribe_sample from speaker_sample_migration module. Now passes transcript to add_person_speech_sample() to store transcripts alongside speech samples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make get_all_people() and get_single_person() async to support lazy migration of v1 speech samples to v2 with transcripts when fetching people data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n model Update fromJson() and toJson() methods to properly parse/serialize speech_sample_transcripts and speech_samples_version fields that were already defined but not being serialized. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show transcript text in italic below each speech sample in the Settings > People page. Handles null/missing transcripts gracefully by only displaying when available. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All existing backend unit tests pass: - 22 transcript_segment tests - 27 text_similarity tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Flutter tests fail due to missing provider/path_provider dependencies in the test environment. Verified same failures occur on main branch, confirming this is not caused by PR changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Required fixes before merge:
Optional: drop PRD.MD/progress.txt and remove the unused import in speaker_sample_migration.py. Ping me when updated. This comment was drafted by AI on behalf of @beastoin |
… transaction Per beastoin's review on PR #4322: - Move v1→v2 migration from GET endpoints to speaker extraction flow - Add in-process asyncio lock per uid/person_id to prevent double migration - Use Firestore transaction in add_person_speech_sample for atomic array updates - Remove unused google.cloud.storage import - Delete PRD.MD and progress.txt files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
All three fixes implemented in latest commit:
Ready for re-review. This comment was drafted by AI on behalf of @beastoin |
|
@beastoin Looks good overall after the fixes—only one thing left: in This comment was drafted by AI on behalf of @beastoin |
Move migration to run before the early return guard so v1 users at the sample limit still get migrated. Migration may drop invalid samples, freeing up space for new ones. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Fixed—migration now runs before the sample count check. V1 users at the limit will get migrated (potentially freeing space if invalid samples are dropped), then count is re-checked. Ready for final review. This comment was drafted by AI on behalf of @beastoin |
|
@beastoin Quick optional cleanup for a follow‑up PR: split This comment was drafted by AI on behalf of @beastoin |
Per reviewer feedback, extract verification + GCS helpers into speaker_sample.py for cleaner reuse: - speaker_sample.py: verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage - speaker_sample_migration.py: migration logic + locking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Done—split into two modules:
Ready for final review. This comment was drafted by AI on behalf of @beastoin |
|
@beastoin Optional cleanup request: move all This comment was drafted by AI on behalf of @beastoin |
Per reviewer feedback: - Add generic helpers: download_blob_bytes, delete_blob - Add speech-profile wrappers: download_speech_profile_bytes, delete_speech_profile_blob - Update speaker_sample.py to use only the wrappers (no direct bucket/client usage) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Done—moved GCS helpers to
Ready for final review. This comment was drafted by AI on behalf of @beastoin |
… transaction Per beastoin's review on PR #4322: - Move v1→v2 migration from GET endpoints to speaker extraction flow - Add in-process asyncio lock per uid/person_id to prevent double migration - Use Firestore transaction in add_person_speech_sample for atomic array updates - Remove unused google.cloud.storage import - Delete PRD.MD and progress.txt files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes #4253. Adds transcript capture/storage and People settings UI for speech samples (from #4322), and enforces stricter verification before saving samples (min 5 words, ≥70% single-speaker dominance via diarization, and ≥60% trigram Jaccard similarity) to avoid low‑quality or mixed‑speaker data. **deploy steps** - [ ] deploy backend(s) --- _This pr was drafted by AI on behalf of @beastoin_
Fixes #4253
Previously, any speaker sample audio could be stored without verification, relying solely on the accuracy of the private cloud sync and the segment-level timestamps. This could degrade speaker identification accuracy, especially when the latency of the STT service is high.
Now samples must pass 3 checks: minimum 5 words transcribed, single speaker dominance ≥70% via diarization (catches crosstalk that segment boundaries miss), and ≥60% text similarity using character trigram Jaccard (chosen for being language-agnostic across CJK/Cyrillic/Arabic without tokenizers or NLP deps, and robust to minor transcription variations).
This pr was drafted by AI on behalf of @beastoin