Fix voice input garbled/repeated text after silence timeout#321
Fix voice input garbled/repeated text after silence timeout#321brendanlong wants to merge 3 commits intomainfrom
Conversation
When the browser's SpeechRecognition auto-stops due to silence (in some browsers), the onend handler restarts it. After restart, Chrome creates a fresh results list starting from index 0, but lastFinalizedLengthRef.current still held the character offset from the previous session. This caused the onresult handler to either: - Skip new words entirely (if new finals.length < old offset) - Report garbled mid-word substrings (if new finals.length exceeded the old offset at some point) Fix: reset lastFinalizedLengthRef.current = 0 before calling start() in the auto-restart path, so the new session's results are processed correctly from the beginning. Also adds a test file (currently skipped due to pre-existing React.act infrastructure issue #320) with tests covering the key behaviors including the auto-restart offset reset. Fixes #317 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new test suite for the useVoiceRecording hook and fixes a bug where the finalized text offset was not reset during automatic session restarts. The Vitest configuration has been updated to include these tests in the jsdom environment, though they are currently skipped. A review comment suggests also resetting the interim transcript state during auto-restarts to ensure UI consistency and prevent stale text from persisting.
src/hooks/useVoiceRecording.ts
Outdated
| try { | ||
| // Reset the finalized length counter — the new session starts a fresh | ||
| // results list from index 0, so the old offset would cause garbled output. | ||
| lastFinalizedLengthRef.current = 0; |
There was a problem hiding this comment.
When auto-restarting the recognition session after a silence timeout, it is important to also reset the interim transcript state. Since the new session starts with a fresh results list, any existing interim text from the previous session is no longer valid and would persist in the UI until the new session produces its first result. Resetting interimRef.current and calling setInterimTranscript('') ensures the UI stays consistent with the recognition state and aligns with the practice of using useRef for synchronous state tracking to avoid race conditions from stale closures.
lastFinalizedLengthRef.current = 0;
interimRef.current = '';
setInterimTranscript('');References
- To avoid race conditions from stale closures in React callbacks that are called multiple times within a single render cycle, use a useRef for synchronous state tracking instead of relying on useState, which is subject to batching.
On silence timeout auto-restart, also clear interimRef and interimTranscript state so stale partial words don't persist into the new session. Suggested by Opus code review of #321. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace character-offset delta tracking (lastFinalizedLengthRef) with the Web Speech API's built-in resultIndex, which points directly to the first new result in each event. This: - Removes lastFinalizedLengthRef entirely — no drift risk if a browser ever revises a finalized result's text - Each final result's transcript is emitted individually and directly, without string concatenation or substring extraction - The onend auto-restart no longer needs to reset any offset; it only needs to clear the stale interim text (already done) Also expands the skipped test suite with the missing cases identified in the Opus code review: interim cleared on restart, no auto-restart after intentional stop, error handling (not-allowed / no-speech / aborted), start() throws in onend, and unmount cleanup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
SpeechRecognitionon a silence timeoutuseVoiceRecordingbehavior (currently skipped pending infrastructure fix in Fix React.act infrastructure issue in jsdom component tests (React 19 + @testing-library/react) #320)Root Cause
When the browser's
SpeechRecognitionstops due to silence (some browsers do this even withcontinuous: true), theonendhandler restarts it by callingrecognition.start(). Chrome creates a fresh results list starting at index 0 for the new session, butlastFinalizedLengthRef.currentstill held the character offset from the previous session.This caused the
onresulthandler to either:finals.length < old offset)finals.lengtheventually exceeded the old offset)Fix
Reset
lastFinalizedLengthRef.current = 0before callingrecognition.start()in the auto-restart path, so the new session's results are processed from index 0 correctly.Test plan
pnpm test:runFixes #317
🤖 Generated with Claude Code