Skip to content

macOS push-to-talk transcription helper built on OpenAI Whisper APIs. Hold a key to speak, release to see text anywhere the cursor is. Your own Willow/Whisper

Notifications You must be signed in to change notification settings

iamguoyisahn/TaskTranscriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fn Key Whisper Transcriber


A macOS push-to-talk assistant powered by OpenAI Whisper. Press a key to record, release to get transcript where your cursor is.


✨ Highlights

  • Low-latency dictation – capture when the hotkey is held, see text 1–2 seconds after release.
  • Flexible hotkeys – Fn/F19, Command, Option, raw key codes (e.g. code63) and toggle mode for keys with release-only events.
  • Privacy aware – microphone stream is opened only while recording.
  • Auto paste – transcripts go to the clipboard by default, optionally auto ⌘V into the focused app.
  • Debug friendly--log-keys and --verbose expose keyboard and API activity instantly.

✅ Requirements

  • OS: macOS (keyboard hooks and AppleScript integration are macOS-specific)
  • Python: 3.12 (enforced by the script)
  • OpenAI API Key: Whisper-capable key (recommended model gpt-4o-mini-transcribe)
  • Dependencies: see requirements.txt
  • System permissions (grant on first run):
    • Privacy & Security ▸ Input Monitoring → Terminal / python3.12
    • Privacy & Security ▸ Accessibility → Terminal / python3.12 (required for auto paste)
    • Privacy & Security ▸ Microphone → Terminal / python3.12

🚀 Quick Start

  1. Create a virtualenv and install deps

    python3.12 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
  2. Configure .env (auto-loaded if present)

    OPENAI_API_KEY=sk-xxxxx
    WHISPER_MODEL=gpt-4o-mini-transcribe
    HOTKEY_LISTENER=auto

    Use plain ASCII characters—remove smart quotes. Alternatively export OPENAI_API_KEY=....

  3. Optional: remap Fn to F19

    • Install Karabiner-Elements
    • Add a Simple Modification fn → f19
    • Or skip remap and use --hotkey code63 (Fn virtual key on macOS)
  4. Run the script

    source .venv/bin/activate
    python3.12 fn_whisper_transcribe.py \
      --hotkey option \
      --listener pynput \
      --model gpt-4o-mini-transcribe \
      --auto-paste \
      --verbose
    • Place the cursor in any text field
    • Hold the hotkey, speak, release → [HH:MM:SS] ... appears and text pastes automatically
    • Stop with Ctrl+C

🔧 Common Flags

python3.12 fn_whisper_transcribe.py \
  --hotkey f19                 # Default hotkey; swap to cmd / option / code63 etc.
  --listener pynput            # Hotkey backend (auto/pynput/keyboard)
  --toggle                     # Tap to start/stop for release-only keys
  --min-duration 0.25          # Ignore clips shorter than X seconds
  --block-duration 0.05        # Audio block size (smaller = lower latency, higher CPU)
  --device "MacBook Pro Microphone"   # Explicit input device
  --channels 1                 # Channel count (set 2 for stereo mics)
  --prompt "Meeting notes"     # Prompt to bias transcription
  --env-file configs/demo.env  # Extra env file
  --auto-paste                 # Copy and immediately send ⌘V (needs Accessibility)
  --no-clipboard               # Skip copying to clipboard
  --log-keys                   # Trace all key events
  --verbose                    # Verbose logging

📝 Permissions & Hotkey Troubleshooting

Symptom Likely cause Fix
“This process is not trusted” Input Monitoring / Accessibility not granted Grant Terminal (or .venv/bin/python3.12) under Privacy & Security; run tccutil reset InputMonitoring com.apple.Terminal if stuck
Fn key ignored Not remapped / release-only event Map to F19 via Karabiner or use --hotkey code63 --toggle
Mic icon appears immediately Other service (e.g. Willow) occupies mic Stop related agents or reboot
404 Invalid URL Model doesn't support audio Use --model gpt-4o-mini-transcribe or update .env
Auto paste fails Accessibility or AppleScript blocked Allow Terminal and /usr/bin/osascript under Accessibility (“Control your computer”)

🧪 Development Tips

  • Compile check: python3.12 -m compileall fn_whisper_transcribe.py
  • Keyboard event probe:
    python3.12 - <<'PY'
    from pynput import keyboard
    print("Listening (Esc to exit)...")
    def on_press(key): print("down:", key)
    def on_release(key): print("up:", key); return key == keyboard.Key.esc
    with keyboard.Listener(on_press=on_press, on_release=on_release) as listener:
        listener.join()
    PY
  • Fallback to keyboard backend (requires sudo): python3.12 fn_whisper_transcribe.py --listener keyboard


📄 License

No license is included yet. Add a LICENSE file (MIT, Apache-2.0, etc.) before publishing publicly.


🙋 Support

Open an issue for hotkey compatibility, permission hurdles, model latency, or quota concerns. Feel free to fork and extend—e.g. live captions or local Whisper backends.

About

macOS push-to-talk transcription helper built on OpenAI Whisper APIs. Hold a key to speak, release to see text anywhere the cursor is. Your own Willow/Whisper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages