-
Notifications
You must be signed in to change notification settings - Fork 1.3k
add speaker sample quality verification before storage #4291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully refactors the compute_text_similarity function into a separate text_utils.py module, which resolves import dependency issues and improves testability. The changes include comprehensive new unit tests for the similarity function and its integration into the speaker sample quality verification process. My review focuses on improving logging practices and resource management in the new and modified functions. I've suggested replacing print statements with the standard logging module for better observability in production and ensuring BytesIO resources are properly handled using a with statement.
|
Required fixes before merge:
Let me know when these are fixed and I’ll re-review. This comment was drafted by AI on behalf of @beastoin |
|
Done. Addressed both issues:
Tests pass. Ready for re-review. This comment was drafted by AI on behalf of @beastoin |
|
Required fixes before merge:
Please fix these and I’ll re-review. This comment was drafted by AI on behalf of @beastoin |
|
Done. Fixed both issues:
Tests pass. Ready for re-review. This comment was drafted by AI on behalf of @beastoin |
|
wait, let me test it first. |
|
Sorry for missing the tests earlier. Added 21 unit tests to make future maintenance easier:
All 70 tests pass. Ready for re-review. Drafted by AI for @beastoin |
Move compute_text_similarity to a standalone module without database dependencies so unit tests can import the real function instead of duplicating it locally. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add technical implementation plan (PRD.MD) and progress tracking checklist (progress.txt) for the speech sample transcripts feature. This feature will display transcripts of speech samples in the Settings > People page, leveraging the existing Deepgram transcription from speaker sample verification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create backend/utils/speaker_sample_migration.py with: - verify_and_transcribe_sample(): Transcribe audio and verify quality - migrate_person_samples_v1_to_v2(): Migrate samples from v1 to v2 format - download_sample_audio(): Download speech sample from GCS - delete_sample_from_storage(): Delete speech sample from GCS This centralizes the verification logic from speaker_identification.py for reuse in lazy migration. Part of speech sample transcripts feature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created backend/utils/speaker_sample_migration.py with all four required functions: verify_and_transcribe_sample, migrate_person_samples_v1_to_v2, download_sample_audio, and delete_sample_from_storage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…son model Add new fields to support storing transcripts alongside speech samples: - speech_sample_transcripts: Optional[List[str]] for parallel transcript array - speech_samples_version: int defaulting to 1 for migration tracking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update add_person_speech_sample() to accept transcript parameter and store it in parallel array. Update remove_person_speech_sample() to remove by index to keep samples and transcripts arrays in sync. Add new functions for migration support: - set_person_speech_sample_transcript() - update_person_speech_samples_after_migration() - clear_person_speaker_embedding() - update_person_speech_samples_version() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline _verify_sample_quality function with the centralized verify_and_transcribe_sample from speaker_sample_migration module. Now passes transcript to add_person_speech_sample() to store transcripts alongside speech samples. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make get_all_people() and get_single_person() async to support lazy migration of v1 speech samples to v2 with transcripts when fetching people data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n model Update fromJson() and toJson() methods to properly parse/serialize speech_sample_transcripts and speech_samples_version fields that were already defined but not being serialized. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show transcript text in italic below each speech sample in the Settings > People page. Handles null/missing transcripts gracefully by only displaying when available. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All existing backend unit tests pass: - 22 transcript_segment tests - 27 text_similarity tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Flutter tests fail due to missing provider/path_provider dependencies in the test environment. Verified same failures occur on main branch, confirming this is not caused by PR changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… transaction Per beastoin's review on PR #4322: - Move v1→v2 migration from GET endpoints to speaker extraction flow - Add in-process asyncio lock per uid/person_id to prevent double migration - Use Firestore transaction in add_person_speech_sample for atomic array updates - Remove unused google.cloud.storage import - Delete PRD.MD and progress.txt files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move migration to run before the early return guard so v1 users at the sample limit still get migrated. Migration may drop invalid samples, freeing up space for new ones. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback, extract verification + GCS helpers into speaker_sample.py for cleaner reuse: - speaker_sample.py: verify_and_transcribe_sample, download_sample_audio, delete_sample_from_storage - speaker_sample_migration.py: migration logic + locking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback: - Add generic helpers: download_blob_bytes, delete_blob - Add speech-profile wrappers: download_speech_profile_bytes, delete_speech_profile_blob - Update speaker_sample.py to use only the wrappers (no direct bucket/client usage) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback: - Deepgram now raises RuntimeError on transcription failure instead of returning empty list, allowing callers to distinguish API failures from low-quality samples - Migration skips samples with transient failures (keeps them as v1) instead of deleting them - Transcript array is padded with None when adding transcript to existing v1 data to maintain alignment with samples array Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback: - Defer blob deletions until after confirming no transient failures, preventing orphaned paths in Firestore on early return - Use empty strings instead of None for transcript padding to avoid breaking Dart's List<String>.from and isNotEmpty checks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- test_speaker_sample.py: 15 tests covering verification logic, boundary cases, transient failures, and edge cases - test_speaker_sample_migration.py: 5 tests for v1→v2 migration, transient failure handling, and deferred deletions - test_users_add_sample_transaction.py: 1 test for transcript array padding with empty strings - test.sh: add ENCRYPTION_SECRET and new test commands Total: 70 tests passing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
c97d2a8 to
066c672
Compare



Fixes #4253.
Adds transcript capture/storage and People settings UI for speech samples (from #4322), and enforces stricter verification before saving samples (min 5 words, ≥70% single-speaker dominance via diarization, and ≥60% trigram Jaccard similarity) to avoid low‑quality or mixed‑speaker data.
deploy steps
This pr was drafted by AI on behalf of @beastoin