Skip to content

Phase 4 — Voice (wake word + commands) and Guess System #4

@Swiftburn

Description

@Swiftburn

Phase 4 — Voice (wake word + commands) and Guess System

Purpose

Add voice interaction: wake-word listener, command parsing, Whisper fallback for freeform guesses, and a persistent guess logging system.

Tasks

  • 4.1 Vosk wake-word listener component

    • File: src/voice/listener.py
    • Work: Implement VoskListener with start(), stop(), emits events: on_wake(), on_audio_chunk(). Allow push-to-talk mode.
    • Tests: tests/test_listener.py using mocked audio frames and verifying event callbacks.
    • DoD: listener can be started/stopped and emits wake events.
  • 4.2 Command parser & registry

    • Files: src/voice/command_parser.py, src/voice/commands.py
    • Work: Parse phrases into structured commands: reveal <N>, translate, show reading <N>, mark known <N>, guess "<text>". Provide a registry mapping to callbacks.
    • Tests: tests/test_command_parser.py with phrase->action examples.
    • DoD: commands parsed correctly and routed to callbacks.
  • 4.3 Whisper integration stub & guess text flow

    • File: src/voice/whisper_handler.py
    • Work: Add a Whisper call wrapper for freeform transcriptions; for tests, provide a mock. Flow: after wake, if input not a known simple command, transcribe with Whisper and pass to guess logger.
    • Tests: tests/test_whisper_handler.py with mocked transcription.
    • DoD: transcription result passed to guess manager.
  • 4.4 Guess manager and logging

    • Files: src/words/guess_manager.py, DB schema update for guesses(word_id, guessed_text, timestamp, confidence)
    • Work: log_guess(word_id, guessed_text, source='voice'), list_guesses(word_id). Guesses are silent by default; optional overlay preview setting.
    • Tests: tests/test_guesses.py verifying persistence and retrieval.
    • DoD: guesses persist and queryable.
  • 4.5 Voice configuration UI hooks

    • File: src/ui/command_editor.py (stub integration)
    • Work: Add settings to configure wake-word, sensitivity, and push-to-talk toggle that the listener reads.
    • DoD: settings stored and read by listener at startup.

Notes

  • In early rollout, push-to-talk should be an easy fallback to ensure reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions