Skip to content

Speaker sample text check drops trimmed samples #4340

@beastoin

Description

@beastoin

Valid speaker samples are being dropped with text_mismatch because the sample audio is a trimmed window that can span multiple same-speaker segments while the expected text only reflects a single segment. This blocks sample storage and speaker embeddings for affected users.

Current Behavior

  • verify_and_transcribe_sample compares transcript against segment text with symmetric similarity.
  • Trimmed samples that should be valid fail when expected text is longer or spans merged segments.
  • Valid samples are dropped due to text_mismatch.

Expected Behavior

Use a language-agnostic containment check so the transcript can be validated as included in the expected text.

Affected Areas

File Line Description
backend/utils/speaker_sample.py 66 Text mismatch check uses symmetric similarity
backend/utils/text_utils.py 1 Only similarity helper exists (no containment helper)

Solution

containment = compute_text_containment(transcript, expected_text)
if containment < MIN_CONTAINMENT:
    return transcript, False, f"text_mismatch: containment={containment:.2f}"

Files to Modify

  • backend/utils/text_utils.py
  • backend/utils/speaker_sample.py
  • backend/tests/unit/test_speaker_sample.py
  • backend/tests/unit/test_text_containment.py
  • backend/test.sh

Impact

Low — adds a containment check and tests; existing quality checks remain.


by AI for @beastoin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions