-
Notifications
You must be signed in to change notification settings - Fork 0
Phase 4 — Voice (wake word + commands) and Guess System #4
Copy link
Copy link
Open
Description
Phase 4 — Voice (wake word + commands) and Guess System
Purpose
Add voice interaction: wake-word listener, command parsing, Whisper fallback for freeform guesses, and a persistent guess logging system.
Tasks
-
4.1 Vosk wake-word listener component
- File:
src/voice/listener.py - Work: Implement
VoskListenerwithstart(),stop(), emits events:on_wake(),on_audio_chunk(). Allow push-to-talk mode. - Tests:
tests/test_listener.pyusing mocked audio frames and verifying event callbacks. - DoD: listener can be started/stopped and emits wake events.
- File:
-
4.2 Command parser & registry
- Files:
src/voice/command_parser.py,src/voice/commands.py - Work: Parse phrases into structured commands:
reveal <N>,translate,show reading <N>,mark known <N>,guess "<text>". Provide a registry mapping to callbacks. - Tests:
tests/test_command_parser.pywith phrase->action examples. - DoD: commands parsed correctly and routed to callbacks.
- Files:
-
4.3 Whisper integration stub & guess text flow
- File:
src/voice/whisper_handler.py - Work: Add a Whisper call wrapper for freeform transcriptions; for tests, provide a mock. Flow: after wake, if input not a known simple command, transcribe with Whisper and pass to guess logger.
- Tests:
tests/test_whisper_handler.pywith mocked transcription. - DoD: transcription result passed to guess manager.
- File:
-
4.4 Guess manager and logging
- Files:
src/words/guess_manager.py, DB schema update forguesses(word_id, guessed_text, timestamp, confidence) - Work:
log_guess(word_id, guessed_text, source='voice'),list_guesses(word_id). Guesses are silent by default; optional overlay preview setting. - Tests:
tests/test_guesses.pyverifying persistence and retrieval. - DoD: guesses persist and queryable.
- Files:
-
4.5 Voice configuration UI hooks
- File:
src/ui/command_editor.py(stub integration) - Work: Add settings to configure wake-word, sensitivity, and push-to-talk toggle that the listener reads.
- DoD: settings stored and read by listener at startup.
- File:
Notes
- In early rollout, push-to-talk should be an easy fallback to ensure reliability.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels