Skip to content

Comments

Add dictation command processing for voice input#1874

Open
liketheduck wants to merge 2 commits intofuto-org:masterfrom
liketheduck:feature/dictation-commands
Open

Add dictation command processing for voice input#1874
liketheduck wants to merge 2 commits intofuto-org:masterfrom
liketheduck:feature/dictation-commands

Conversation

@liketheduck
Copy link

@liketheduck liketheduck commented Feb 11, 2026

Summary

Adds a post-processing step to the voice input pipeline that converts spoken command phrases into their corresponding characters and formatting. When enabled, phrases like "new line", "dollar sign", "caps on", and "open parenthesis" are replaced with the appropriate output before text is committed to the input field.

Covers 71 commands across 10 categories: formatting, capitalization, punctuation, symbols, math, currency, emoticons, and intellectual property marks. Each category can be independently toggled. The feature is enabled by default and lives behind its own settings sub-page under Voice Input.

Co-authored-by: Claude Code (claude-opus-4-6)

Changes

  • DictationCommandProcessor.kt -- Core processor. Stateless across calls. Handles multi-word command matching (up to 4 words, longest match first), stateful modes (caps on/off, no space on/off), numeral/roman numeral conversion, and per-command spacing rules. Tolerates Whisper auto-punctuation on command words.
  • VoiceInputAction.kt -- Wired into the voice input pipeline. Sanitizer runs first to clean Whisper output, then the processor handles command substitution.
  • VoiceInputSettingKeys.kt -- 9 DataStore preference keys (master toggle + 8 categories).
  • VoiceInput.kt / SettingsNavigator.kt -- Settings UI. Single navigation item on the Voice Input page opens a dedicated Dictation Commands sub-page with all toggles.
  • strings-uix.xml -- 18 string resources for setting titles and subtitles.
  • build.gradle -- Added java/test as a JVM unit test source set.
  • DictationCommandProcessorTest.kt -- 112 unit tests covering all command categories, stateful interactions, edge cases, Whisper punctuation tolerance, and cursor-context spacing.

Design decisions

  • Runs after ModelOutputSanitizer so that formatting characters (\n, \t) are not destroyed by trim().
  • Strips trailing Whisper auto-punctuation from words during matching, but only strips it from the output buffer when the preceding token was a regular word (not a command). This preserves explicit command sequences like "comma new line" while cleaning up Whisper artifacts like "hello, new line, world".
  • Matches the existing codebase style: Kotlin object singleton, @JvmStatic entry point, DataStore settings with SettingsKey pattern.

Test plan

  • 112 JVM unit tests pass via ./gradlew testUnstableDebugUnitTest
  • Install APK and verify voice commands produce expected output (new line, period, dollar sign, caps on/off, etc.)
  • Verify settings sub-page renders correctly and toggles persist
  • Verify disabling the master toggle passes transcription through unchanged
  • Verify feature does not affect voice input when "Use system voice input" is enabled

@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

3 similar comments
@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

@liketheduck liketheduck force-pushed the feature/dictation-commands branch from 1e707df to 37c69d0 Compare February 11, 2026 01:55
@futo-cla-pr-labler
Copy link

Please sign our contributor license agreement at https://cla.futo.org

@liketheduck
Copy link
Author

liketheduck commented Feb 11, 2026

Please sign our contributor license agreement at https://cla.futo.org

It has been signed

Adds a post-processing step to the voice input pipeline that converts
spoken command phrases (e.g. "new line", "dollar sign", "caps on") into
their corresponding characters and formatting. Covers 69 commands across
10 categories, each independently toggleable in a new Dictation Commands
settings sub-page under Voice Input. Includes 108 JVM unit tests.
@liketheduck liketheduck force-pushed the feature/dictation-commands branch from 95b8221 to 2a09837 Compare February 11, 2026 02:07
@Zvonimir-FUTO Zvonimir-FUTO requested a review from abb128 February 11, 2026 07:53
@Zvonimir-FUTO Zvonimir-FUTO added Enhancement Request for a new feature Voice Input labels Feb 11, 2026
Whisper often transcribes spoken "exclamation point" as "Exclamation."
with capitalization and trailing period. The existing punctuation
tolerance handles both; this adds the missing command aliases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants