fix: auto-submit always fires, plain Enter, TTS volume#4
fix: auto-submit always fires, plain Enter, TTS volume#4hakanensari wants to merge 6 commits intoPerIPan:mainfrom
Conversation
When setup runs a second time (e.g. app relaunch or resetAndRerun), `uv venv` fails because the venv directory already exists. This prevents .setup-complete from being written, so the Python server never starts. The --clear flag tells uv to replace an existing venv instead of erroring out.
Add design document for three interaction modes: press-to-talk (existing), hold-to-talk, and hands-free with silence detection and barge-in support.
Four-phase plan covering hold-to-talk, TTS latency optimization, hands-free mode with silence detection and barge-in, and polish. Phases 1 and 2 can run in parallel.
- Increase afplay volume to 4x (Kokoro output is quiet) - Guard mktemp failure to prevent poison temp files - Auto-submit fires on every transcription when enabled (no trigger word needed) - Send plain Enter instead of Cmd+Enter (Cmd+Enter maximizes Ghostty)
- Auto-submit fires on every transcription when enabled (no trigger word needed) - Send plain Enter with explicit zero modifier flags (fixes Ghostty maximizing on Ctrl+Enter) - Increase delay to 1s to ensure Control key is released before Enter - Boost afplay volume to 4x (Kokoro output is naturally quiet) - Guard mktemp failure to prevent poison temp files - Remove outdated "say submit or send" hint from menubar
There was a problem hiding this comment.
Pull request overview
This PR updates Claude Whisperer’s automation and speech pipeline to make auto-submit unconditional when enabled, adjust the synthetic keypress to plain Enter (avoiding modifier bleed), and improve TTS playback loudness and robustness.
Changes:
- Auto-submit now always triggers when the auto-submit flag is enabled (no trigger phrase required).
- Synthetic submit keypress changed from Cmd+Enter to plain Enter via CGEvent, explicitly clearing modifier flags and increasing the delay before pressing Enter.
- TTS hook adds a
mktempfailure guard and attempts a volume boost duringafplayplayback; menubar hint text is removed; docs for hands-free mode planning/design are added.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
servers/unified_server.py |
Switches to plain Enter key event + modifier clearing; makes auto-submit unconditional when enabled. |
hooks/tts-hook.sh |
Adds mktemp failure guard and changes playback invocation to include a volume flag. |
app/Sources/ClaudeWhisperer/SetupManager.swift |
Adjusts uv venv invocation to include --clear. |
app/Sources/ClaudeWhisperer/MenuBarView.swift |
Removes outdated “say submit/send” UI hint. |
docs/plans/2026-03-10-hands-free-mode-implementation.md |
Adds detailed hands-free/hold-to-talk implementation plan. |
docs/plans/2026-03-10-hands-free-mode-design.md |
Adds hands-free/hold-to-talk design doc. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @@ -266,7 +267,8 @@ async def transcribe( | |||
|
|
|||
| should_submit = False | |||
| if os.path.exists(AUTO_SUBMIT_FLAG): | |||
There was a problem hiding this comment.
With auto-submit enabled, should_submit is set to True unconditionally, even when the transcription result is empty/whitespace. This will still schedule a synthetic Enter keypress and can end up submitting an empty message. Consider gating auto-submit on text.strip() (or similar) so Enter is only sent when there’s something to submit.
| if os.path.exists(AUTO_SUBMIT_FLAG): | |
| # Only auto-submit when there is non-whitespace transcription text. | |
| if os.path.exists(AUTO_SUBMIT_FLAG) and text.strip(): |
| if os.path.exists(AUTO_SUBMIT_FLAG): | ||
| text, should_submit = check_submit_trigger(text) | ||
| text, _ = check_submit_trigger(text) | ||
| should_submit = True |
There was a problem hiding this comment.
Now that auto-submit always fires, calling check_submit_trigger() solely to strip trailing trigger phrases will remove legitimate dictated content whenever it ends with any entry in SUBMIT_TRIGGERS (e.g., "enter", "go ahead", "send"). If the goal is just to drop the legacy "submit/send" habit, consider narrowing the strip list (or making stripping optional) so normal language endings aren’t silently deleted.
| guard runCommand( | ||
| Paths.uvBinary.path, | ||
| args: ["venv", Paths.venv.path, "--python", "3.13"], | ||
| args: ["venv", Paths.venv.path, "--python", "3.13", "--clear"], | ||
| step: "Creating Python environment..." |
There was a problem hiding this comment.
uv venv is invoked with --clear. The repo’s setup.sh uses uv venv ... --python 3.13 without this flag, and not all uv versions support --clear. If the bundled uv doesn’t recognize it, first-launch setup will fail. Consider explicitly deleting Paths.venv when re-creating the environment (or probing supported flags / uv version) instead of relying on --clear.
|
|
||
| if [ -s "$TMPFILE" ]; then | ||
| afplay "$TMPFILE" 2>/dev/null | ||
| afplay -v 4 "$TMPFILE" 2>/dev/null |
There was a problem hiding this comment.
afplay -v expects a normalized volume scalar (typically 0.0–1.0). Passing -v 4 may be clamped or cause playback to fail, which would undermine the intended volume boost. Consider using a supported range for -v and, if you need a true 4× gain, amplify/normalize the WAV data before playback or make the gain configurable with a safe ceiling.
| afplay -v 4 "$TMPFILE" 2>/dev/null | |
| # Use configurable volume, clamped to afplay's expected 0.0–1.0 range | |
| RAW_VOLUME="${TTS_VOLUME:-1.0}" | |
| VOLUME=$(awk -v v="$RAW_VOLUME" 'BEGIN { if (v == "" || v < 0) print 1.0; else if (v > 1.0) print 1.0; else print v }') | |
| afplay -v "$VOLUME" "$TMPFILE" 2>/dev/null |
Summary
Test plan