Speaker sample text check drops trimmed samples

Valid speaker samples are being dropped with `text_mismatch` because the sample audio is a trimmed window that can span multiple same-speaker segments while the expected text only reflects a single segment. This blocks sample storage and speaker embeddings for affected users.

### Current Behavior
- `verify_and_transcribe_sample` compares transcript against segment text with symmetric similarity.
- Trimmed samples that should be valid fail when expected text is longer or spans merged segments.
- Valid samples are dropped due to `text_mismatch`.

### Expected Behavior
Use a language-agnostic containment check so the transcript can be validated as included in the expected text.

### Affected Areas
| File | Line | Description |
|------|------|-------------|
| backend/utils/speaker_sample.py | 66 | Text mismatch check uses symmetric similarity |
| backend/utils/text_utils.py | 1 | Only similarity helper exists (no containment helper) |

### Solution
```python
containment = compute_text_containment(transcript, expected_text)
if containment < MIN_CONTAINMENT:
    return transcript, False, f"text_mismatch: containment={containment:.2f}"
```

### Files to Modify
- backend/utils/text_utils.py
- backend/utils/speaker_sample.py
- backend/tests/unit/test_speaker_sample.py
- backend/tests/unit/test_text_containment.py
- backend/test.sh

### Impact
Low — adds a containment check and tests; existing quality checks remain.

---
_by AI for @beastoin_


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speaker sample text check drops trimmed samples #4340

Current Behavior

Expected Behavior

Affected Areas

Solution

Files to Modify

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File	Line	Description
backend/utils/speaker_sample.py	66	Text mismatch check uses symmetric similarity
backend/utils/text_utils.py	1	Only similarity helper exists (no containment helper)

Speaker sample text check drops trimmed samples #4340

Description

Current Behavior

Expected Behavior

Affected Areas

Solution

Files to Modify

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions