Skip to content

aluncstokes/whisper-wait

Repository files navigation

Whisper-Wait

Whisper-Wait is a real-time audio transcription CLI powered by OpenAI's Whisper, with a wide-screen friendly TUI, local models, and OpenAI API support.

Features

  • Live recording loop: Press ENTER to start/stop, with a clear recording indicator
  • Dual modes: Run local Whisper models or the OpenAI Whisper API
  • TUI dashboard: Status, commands, and recent transcripts laid out for wide terminals
  • Input device picker: Switch microphones without restarting
  • Clipboard auto-copy: Copy transcripts automatically, with an in-app toggle
  • Transcript history: Recent transcripts shown in a readable table (newest at bottom)
  • Cost tracking (API): Per-clip estimate plus session total
  • Safe defaults: .env support, chunking for Whisper limits, audio archiving

Requirements

  • Python 3.8+
  • A working microphone
  • Optional: CUDA-capable GPU for local models (CPU works too)

Installation

  1. Clone or download this repository

  2. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set up your API key (for API mode):

    • Create a .env file in the project root:
      OPENAI_API_KEY=your_api_key_here
      OPENAI_ORG_ID=your_org_id  # Optional
      
    • Or export it in your shell:
      export OPENAI_API_KEY="your_api_key_here"
  5. Make the shell script executable:

    chmod +x whisper-wait.sh

Usage

Interactive Mode (Recommended)

Run without arguments to enter the menu and dashboard:

./whisper-wait.sh

You will:

  1. Choose OpenAI API or Local Model
  2. Select a Whisper model (local mode only)
  3. Enter the recording loop

Ready Prompt Commands

From the dashboard:

  • ENTER start recording
  • h show history (press ENTER to return)
  • c toggle auto-copy to clipboard
  • d choose input device
  • m change model (local mode only)
  • q quit

Command-Line Mode

Using the OpenAI API

./whisper-wait.sh api

Using a Local Model

./whisper-wait.sh local

Specify a model directly:

./whisper-wait.sh local -m medium.en

Available Models

Local Models:

  • tiny, tiny.en
  • base, base.en
  • small, small.en
  • medium, medium.en
  • large, large-v2, large-v3

API Mode: Uses whisper-1.

How It Works

  1. Recording: Press ENTER to start recording. Press ENTER again to stop.
  2. Processing: Audio is saved to ~/audio_archive with a UUID filename. Large files are split into smaller chunks.
  3. Transcription:
    • Local Mode: Runs the selected Whisper model on CPU/GPU
    • API Mode: Uploads audio to the OpenAI API; cost is estimated at $0.006/min
  4. Output: The transcript is displayed in a styled panel and optionally copied to your clipboard.
  5. Loop: The dashboard returns for another recording (or q to quit).

Configuration

Edit config.py to customize:

  • SAMPLE_RATE: Recording sample rate (default: 16000 Hz)
  • AUDIO_ARCHIVE_DIR: Archive location (default: ~/audio_archive)
  • DEVICE: GPU/CPU selection (auto-detected)
  • DEFAULT_MODEL: Default Whisper model (default: medium.en)
  • MAX_DURATION_SEC / MAX_SIZE_BYTES: Chunk limits for Whisper
  • CHUNK_SIZE: Audio frames per read
  • DEFAULT_AUTO_COPY: Clipboard auto-copy default
  • HISTORY_PREVIEW_COUNT: Rows shown in the dashboard history

History + cost logs are stored in ~/.whisper_wait.

ZSH Completion

For ZSH users, a completion script is included:

# Add to your .zshrc:
fpath=(/path/to/my-rt-whisper $fpath)
autoload -Uz compinit && compinit

Or source it directly:

source /path/to/my-rt-whisper/_whisper-wait

Troubleshooting

"OPENAI_API_KEY environment variable not set"

  • Create a .env file with your API key or export it in your shell.

"Invalid model" error

  • Ensure you're using one of the available models listed above.
  • The interactive menu prevents this by restricting selection to valid choices.

GPU out of memory

  • Try a smaller model like small.en or base.en.
  • Ensure CUDA is installed and your GPU has enough memory.

No audio recorded

  • Check microphone permissions and default input device.
  • Verify sounddevice can access your microphone: python -m sounddevice

Clipboard unavailable

  • Some environments do not provide clipboard access. Toggle auto-copy off with c.

Project Structure

my-rt-whisper/
├── whisper_wait.py      # Main application
├── whisper-wait.sh      # Shell wrapper script
├── config.py            # Configuration constants
├── audio.py             # Audio recording/processing utilities
├── transcribe.py        # Transcription interfaces
├── requirements.txt     # Python dependencies
├── _whisper-wait        # ZSH completion script
└── .env                 # Environment variables (create this)

Runtime state (not in repo):

  • ~/audio_archive/ (audio recordings)
  • ~/.whisper_wait/last_transcriptions.txt (history)
  • ~/.whisper_wait/session_costs.json (API costs)

License

This project uses OpenAI's Whisper model. Please refer to OpenAI's usage policies and the Whisper repository for licensing information.

Credits

Built with:

About

you whisper and it waits and transcribes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published