Conversation
When setup runs a second time (e.g. app relaunch or resetAndRerun), `uv venv` fails because the venv directory already exists. This prevents .setup-complete from being written, so the Python server never starts. The --clear flag tells uv to replace an existing venv instead of erroring out.
Add design document for three interaction modes: press-to-talk (existing), hold-to-talk, and hands-free with silence detection and barge-in support.
Four-phase plan covering hold-to-talk, TTS latency optimization, hands-free mode with silence detection and barge-in, and polish. Phases 1 and 2 can run in parallel.
- Increase afplay volume to 4x (Kokoro output is quiet) - Guard mktemp failure to prevent poison temp files - Auto-submit fires on every transcription when enabled (no trigger word needed) - Send plain Enter instead of Cmd+Enter (Cmd+Enter maximizes Ghostty)
- Auto-submit fires on every transcription when enabled (no trigger word needed) - Send plain Enter with explicit zero modifier flags (fixes Ghostty maximizing on Ctrl+Enter) - Increase delay to 1s to ensure Control key is released before Enter - Boost afplay volume to 4x (Kokoro output is naturally quiet) - Guard mktemp failure to prevent poison temp files - Remove outdated "say submit or send" hint from menubar
- Use longer mktemp template without .wav extension to avoid name collisions - Update orphan cleanup glob to match new pattern
There was a problem hiding this comment.
Pull request overview
This PR primarily targets reliability of the TTS hook by preventing temp-file name collisions and ensuring orphan cleanup matches the new temp-file pattern. It also includes changes related to auto-submit behavior/input submission and adds hands-free mode design/implementation planning docs.
Changes:
- Update
tts-hook.shtemp file creation to use a safermktemptemplate and broaden orphan cleanup to match. - Adjust unified server auto-submit behavior and the way the “submit” keypress is synthesized.
- Add hands-free mode design/implementation planning documentation and tweak setup/UI behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| servers/unified_server.py | Changes auto-submit flow and synthesizes a plain Enter keypress with a longer delay. |
| hooks/tts-hook.sh | Fixes mktemp template/collision risk and updates orphan cleanup; adjusts playback invocation. |
| docs/plans/2026-03-10-hands-free-mode-implementation.md | Adds detailed implementation plan for hands-free + hold-to-talk. |
| docs/plans/2026-03-10-hands-free-mode-design.md | Adds design doc outlining modes, architecture, and state machine. |
| app/Sources/ClaudeWhisperer/SetupManager.swift | Modifies venv creation to clear existing environment during setup. |
| app/Sources/ClaudeWhisperer/MenuBarView.swift | Removes auto-submit “say submit/send…” helper text from the UI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| should_submit = False | ||
| if os.path.exists(AUTO_SUBMIT_FLAG): | ||
| text, should_submit = check_submit_trigger(text) | ||
| text, _ = check_submit_trigger(text) | ||
| should_submit = True | ||
|
|
There was a problem hiding this comment.
should_submit is forced to True whenever the auto-submit flag exists, even when the transcription result is empty/whitespace. This can cause an Enter keypress to be sent with no text (e.g., silence or a failed transcription), potentially submitting an empty message. Consider gating auto-submit on text.strip() (while still calling check_submit_trigger only to strip trigger words when present).
| @@ -187,25 +187,26 @@ def press_cmd_enter(): | |||
| _cg.CFRelease.argtypes = [ctypes.c_void_p] | |||
|
|
|||
| kCGSessionEventTap = 1 | |||
| kCGEventFlagMaskCommand = 0x00100000 | |||
| kVK_Return = 0x24 # 36 | |||
|
|
|||
| key_down = _cg.CGEventCreateKeyboardEvent(None, kVK_Return, True) | |||
| key_up = _cg.CGEventCreateKeyboardEvent(None, kVK_Return, False) | |||
| _cg.CGEventSetFlags(key_down, kCGEventFlagMaskCommand) | |||
| _cg.CGEventSetFlags(key_up, kCGEventFlagMaskCommand) | |||
| # Explicitly clear all modifier flags so held keys (Ctrl, Cmd, etc.) | |||
| # don't bleed into the Enter event | |||
| _cg.CGEventSetFlags(key_down, 0) | |||
| _cg.CGEventSetFlags(key_up, 0) | |||
| _cg.CGEventPost(kCGSessionEventTap, key_down) | |||
| _cg.CGEventPost(kCGSessionEventTap, key_up) | |||
| _cg.CFRelease(key_down) | |||
| _cg.CFRelease(key_up) | |||
| except Exception: | |||
| logger.exception("press_cmd_enter failed") | |||
| logger.exception("press_enter failed") | |||
|
|
|||
|
|
|||
| async def _delayed_enter(): | |||
| await asyncio.sleep(0.3) | |||
| await asyncio.sleep(1.0) | |||
| loop = asyncio.get_running_loop() | |||
| await loop.run_in_executor(None, press_cmd_enter) | |||
| await loop.run_in_executor(None, press_enter) | |||
There was a problem hiding this comment.
This PR is titled/described as a TTS temp-file collision fix, but this hunk also changes auto-submit semantics (always submit when the flag exists), changes the keypress behavior from Cmd+Enter to plain Enter, and increases the submit delay from 0.3s to 1.0s. Please either update the PR title/description to reflect these behavior changes or split them into a separate PR to keep the fix scoped.
| if [ -s "$TMPFILE" ]; then | ||
| afplay "$TMPFILE" 2>/dev/null | ||
| afplay -v 4 "$TMPFILE" 2>/dev/null | ||
| fi |
There was a problem hiding this comment.
afplay -v 4 hard-codes a much louder playback volume than the default and may cause clipping/distortion (and if afplay enforces a 0.0–1.0 range, it can fail silently due to stderr being redirected). Consider leaving volume at default or making it configurable via an env var/config file with a validated range.
| // Step 1: Create venv with bundled uv | ||
| guard runCommand( | ||
| Paths.uvBinary.path, | ||
| args: ["venv", Paths.venv.path, "--python", "3.13"], | ||
| args: ["venv", Paths.venv.path, "--python", "3.13", "--clear"], | ||
| step: "Creating Python environment..." |
There was a problem hiding this comment.
Adding --clear changes setup behavior by forcibly wiping any existing venv at Paths.venv on every first-launch setup run. Since this is unrelated to the stated PR goal (TTS temp file collision), consider moving it to a separate PR or updating the PR description to include the setup/venv behavior change.
Summary
.wavextension to prevent name collisions that silently break TTSThe previous
tts_XXXXXX.wavtemplate could leave a literal file behind on mktemp failure, blocking all future TTS until manually cleaned up.