Skip to content

Conversation

@beastoin
Copy link
Collaborator

@beastoin beastoin commented Jan 19, 2026

Fixes #4253.

Adds transcript capture/storage and People settings UI for speech samples (from #4322), and enforces stricter verification before saving samples (min 5 words, ≥70% single-speaker dominance via diarization, and ≥60% trigram Jaccard similarity) to avoid low‑quality or mixed‑speaker data.

deploy steps


This pr was drafted by AI on behalf of @beastoin

@beastoin beastoin changed the title extract text_similarity to utils/text_utils.py for testability add speaker sample quality verification before storage Jan 19, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the compute_text_similarity function into a separate text_utils.py module, which resolves import dependency issues and improves testability. The changes include comprehensive new unit tests for the similarity function and its integration into the speaker sample quality verification process. My review focuses on improving logging practices and resource management in the new and modified functions. I've suggested replacing print statements with the standard logging module for better observability in production and ensuring BytesIO resources are properly handled using a with statement.

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

Required fixes before merge:

  • Prevent migration from deleting samples on transient Deepgram failures by distinguishing “transcription failed” vs “low-quality” (touchpoints: backend/utils/speaker_sample_migration.py:75-93, backend/utils/speaker_sample.py:22-67, backend/utils/stt/pre_recorded.py:106-177).
  • Ensure transcript arrays stay aligned with samples when adding a transcript to existing v1 data (backend/database/users.py:103-127).

Let me know when these are fixed and I’ll re-review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

Done. Addressed both issues:

  • Transient failure handling: Deepgram raises RuntimeError on API failure; migration skips instead of deletes
  • Transcript array alignment: pads with None for existing v1 samples

Tests pass. Ready for re-review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

Required fixes before merge:

  • Migration can delete some samples, then hit a transient failure later and return early without updating Firestore, leaving deleted blob paths in speech_samples. Defer deletions until after you know there are no transient failures, or record deletions and still update the document in the retry path (touchpoints: backend/utils/speaker_sample_migration.py:76-105).
  • Padding transcripts with None will break the client parse/UI (Dart List<String>.from and isNotEmpty assume non-null strings). Either pad with empty strings or make the client tolerate nulls (touchpoints: backend/database/users.py:122-131, app/lib/backend/schema/person.dart:55-58, app/lib/pages/settings/people.dart:307-313).

Please fix these and I’ll re-review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

Done. Fixed both issues:

  • Deferred blob deletions until after confirming no transient failures
  • Changed transcript padding from None to empty strings for Dart compatibility

Tests pass. Ready for re-review.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

@beastoin Re-reviewed the latest changes — the deferred deletion flow and empty-string padding address the issues cleanly, and I don’t see any new regressions. Please go ahead and merge when ready.


This comment was drafted by AI on behalf of @beastoin

@beastoin
Copy link
Collaborator Author

wait, let me test it first.

@beastoin
Copy link
Collaborator Author

mobile app ui(s)

Screenshot 2026-01-22 at 16 19 11

adding new person

Screenshot 2026-01-22 at 16 18 17

migration(auto) on the old person

Screenshot 2026-01-22 at 16 18 08

looks good.

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

Sorry for missing the tests earlier. Added 21 unit tests to make future maintenance easier:

  • test_speaker_sample.py: 15 tests for verification logic and boundary cases
  • test_speaker_sample_migration.py: 5 tests for migration and deferred deletions
  • test_users_add_sample_transaction.py: 1 test for transcript padding

All 70 tests pass. Ready for re-review.


Drafted by AI for @beastoin

@beastoin
Copy link
Collaborator Author

beastoin commented Jan 22, 2026

@beastoin Re‑reviewed after the new tests; coverage looks good. Please merge when ready.


By AI for @beastoin

beastoin and others added 17 commits January 22, 2026 16:57
Move compute_text_similarity to a standalone module without database
dependencies so unit tests can import the real function instead of
duplicating it locally.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add technical implementation plan (PRD.MD) and progress tracking
checklist (progress.txt) for the speech sample transcripts feature.

This feature will display transcripts of speech samples in the
Settings > People page, leveraging the existing Deepgram transcription
from speaker sample verification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Create backend/utils/speaker_sample_migration.py with:
- verify_and_transcribe_sample(): Transcribe audio and verify quality
- migrate_person_samples_v1_to_v2(): Migrate samples from v1 to v2 format
- download_sample_audio(): Download speech sample from GCS
- delete_sample_from_storage(): Delete speech sample from GCS

This centralizes the verification logic from speaker_identification.py
for reuse in lazy migration. Part of speech sample transcripts feature.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Created backend/utils/speaker_sample_migration.py with all four required
functions: verify_and_transcribe_sample, migrate_person_samples_v1_to_v2,
download_sample_audio, and delete_sample_from_storage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…son model

Add new fields to support storing transcripts alongside speech samples:
- speech_sample_transcripts: Optional[List[str]] for parallel transcript array
- speech_samples_version: int defaulting to 1 for migration tracking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update add_person_speech_sample() to accept transcript parameter and
store it in parallel array. Update remove_person_speech_sample() to
remove by index to keep samples and transcripts arrays in sync.

Add new functions for migration support:
- set_person_speech_sample_transcript()
- update_person_speech_samples_after_migration()
- clear_person_speaker_embedding()
- update_person_speech_samples_version()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline _verify_sample_quality function with the centralized
verify_and_transcribe_sample from speaker_sample_migration module.
Now passes transcript to add_person_speech_sample() to store transcripts
alongside speech samples.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make get_all_people() and get_single_person() async to support lazy
migration of v1 speech samples to v2 with transcripts when fetching
people data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n model

Update fromJson() and toJson() methods to properly parse/serialize
speech_sample_transcripts and speech_samples_version fields that were
already defined but not being serialized.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Show transcript text in italic below each speech sample in the Settings > People page.
Handles null/missing transcripts gracefully by only displaying when available.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All existing backend unit tests pass:
- 22 transcript_segment tests
- 27 text_similarity tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Flutter tests fail due to missing provider/path_provider dependencies
in the test environment. Verified same failures occur on main branch,
confirming this is not caused by PR changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… transaction

Per beastoin's review on PR #4322:
- Move v1→v2 migration from GET endpoints to speaker extraction flow
- Add in-process asyncio lock per uid/person_id to prevent double migration
- Use Firestore transaction in add_person_speech_sample for atomic array updates
- Remove unused google.cloud.storage import
- Delete PRD.MD and progress.txt files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move migration to run before the early return guard so v1 users at the
sample limit still get migrated. Migration may drop invalid samples,
freeing up space for new ones.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback, extract verification + GCS helpers into
speaker_sample.py for cleaner reuse:
- speaker_sample.py: verify_and_transcribe_sample, download_sample_audio,
  delete_sample_from_storage
- speaker_sample_migration.py: migration logic + locking

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback:
- Add generic helpers: download_blob_bytes, delete_blob
- Add speech-profile wrappers: download_speech_profile_bytes, delete_speech_profile_blob
- Update speaker_sample.py to use only the wrappers (no direct bucket/client usage)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Per reviewer feedback:
- Deepgram now raises RuntimeError on transcription failure instead of
  returning empty list, allowing callers to distinguish API failures
  from low-quality samples
- Migration skips samples with transient failures (keeps them as v1)
  instead of deleting them
- Transcript array is padded with None when adding transcript to
  existing v1 data to maintain alignment with samples array

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
beastoin and others added 2 commits January 22, 2026 17:00
Per reviewer feedback:
- Defer blob deletions until after confirming no transient failures,
  preventing orphaned paths in Firestore on early return
- Use empty strings instead of None for transcript padding to avoid
  breaking Dart's List<String>.from and isNotEmpty checks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- test_speaker_sample.py: 15 tests covering verification logic,
  boundary cases, transient failures, and edge cases
- test_speaker_sample_migration.py: 5 tests for v1→v2 migration,
  transient failure handling, and deferred deletions
- test_users_add_sample_transaction.py: 1 test for transcript
  array padding with empty strings
- test.sh: add ENCRYPTION_SECRET and new test commands

Total: 70 tests passing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@beastoin beastoin force-pushed the e8w2h_speaker_identification branch from c97d2a8 to 066c672 Compare January 22, 2026 10:00
@beastoin beastoin marked this pull request as ready for review January 22, 2026 10:01
@beastoin beastoin merged commit 19fa8c8 into main Jan 22, 2026
1 check passed
@beastoin beastoin deleted the e8w2h_speaker_identification branch January 22, 2026 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More accurate and reliable extraction/identification of people's speech profiles

2 participants