Skip to content

Fix voice input garbled/repeated text after silence timeout#321

Closed
brendanlong wants to merge 3 commits intomainfrom
claude/9c79dd71-46d7-45d1-9df8-bb56df29c2be
Closed

Fix voice input garbled/repeated text after silence timeout#321
brendanlong wants to merge 3 commits intomainfrom
claude/9c79dd71-46d7-45d1-9df8-bb56df29c2be

Conversation

@brendanlong
Copy link
Copy Markdown
Owner

Summary

Root Cause

When the browser's SpeechRecognition stops due to silence (some browsers do this even with continuous: true), the onend handler restarts it by calling recognition.start(). Chrome creates a fresh results list starting at index 0 for the new session, but lastFinalizedLengthRef.current still held the character offset from the previous session.

This caused the onresult handler to either:

  • Skip new words (if new finals.length < old offset)
  • Report garbled mid-word substrings (if new finals.length eventually exceeded the old offset)

Fix

Reset lastFinalizedLengthRef.current = 0 before calling recognition.start() in the auto-restart path, so the new session's results are processed from index 0 correctly.

recognition.onend = () => {
  if (recognitionRef.current === recognition) {
    try {
      // Reset offset — new session starts a fresh results list from index 0
      lastFinalizedLengthRef.current = 0;
      recognition.start();
    } catch { ... }
  }
};

Test plan

  • Manual: use voice input, pause speaking for a few seconds (to trigger silence timeout + auto-restart), then continue speaking — words should appear correctly without garbling
  • Unit tests pass: pnpm test:run

Fixes #317

🤖 Generated with Claude Code

When the browser's SpeechRecognition auto-stops due to silence (in some
browsers), the onend handler restarts it. After restart, Chrome creates a
fresh results list starting from index 0, but lastFinalizedLengthRef.current
still held the character offset from the previous session.

This caused the onresult handler to either:
- Skip new words entirely (if new finals.length < old offset)
- Report garbled mid-word substrings (if new finals.length exceeded the
  old offset at some point)

Fix: reset lastFinalizedLengthRef.current = 0 before calling start() in
the auto-restart path, so the new session's results are processed correctly
from the beginning.

Also adds a test file (currently skipped due to pre-existing React.act
infrastructure issue #320) with tests covering the key behaviors including
the auto-restart offset reset.

Fixes #317

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new test suite for the useVoiceRecording hook and fixes a bug where the finalized text offset was not reset during automatic session restarts. The Vitest configuration has been updated to include these tests in the jsdom environment, though they are currently skipped. A review comment suggests also resetting the interim transcript state during auto-restarts to ensure UI consistency and prevent stale text from persisting.

try {
// Reset the finalized length counter — the new session starts a fresh
// results list from index 0, so the old offset would cause garbled output.
lastFinalizedLengthRef.current = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When auto-restarting the recognition session after a silence timeout, it is important to also reset the interim transcript state. Since the new session starts with a fresh results list, any existing interim text from the previous session is no longer valid and would persist in the UI until the new session produces its first result. Resetting interimRef.current and calling setInterimTranscript('') ensures the UI stays consistent with the recognition state and aligns with the practice of using useRef for synchronous state tracking to avoid race conditions from stale closures.

          lastFinalizedLengthRef.current = 0;
          interimRef.current = '';
          setInterimTranscript('');
References
  1. To avoid race conditions from stale closures in React callbacks that are called multiple times within a single render cycle, use a useRef for synchronous state tracking instead of relying on useState, which is subject to batching.

claude added 2 commits April 6, 2026 22:49
On silence timeout auto-restart, also clear interimRef and interimTranscript
state so stale partial words don't persist into the new session.

Suggested by Opus code review of #321.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace character-offset delta tracking (lastFinalizedLengthRef) with the
Web Speech API's built-in resultIndex, which points directly to the first
new result in each event. This:

- Removes lastFinalizedLengthRef entirely — no drift risk if a browser
  ever revises a finalized result's text
- Each final result's transcript is emitted individually and directly,
  without string concatenation or substring extraction
- The onend auto-restart no longer needs to reset any offset; it only
  needs to clear the stale interim text (already done)

Also expands the skipped test suite with the missing cases identified in
the Opus code review: interim cleared on restart, no auto-restart after
intentional stop, error handling (not-allowed / no-speech / aborted),
start() throws in onend, and unmount cleanup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@brendanlong brendanlong closed this Apr 7, 2026
@brendanlong brendanlong deleted the claude/9c79dd71-46d7-45d1-9df8-bb56df29c2be branch April 7, 2026 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Voice input garbled

2 participants