Press shortcut → Speak → Get Text
Local speech-to-text for GNOME Shell. No cloud. No APIs.
Status indicator in system tray (top-right panel) always shows recording/processing state.
- Minimal - Errors only, stay-out-of-the-way mode
- Normal - Brief notifications, multitask while recording
- Focused - Modal during recording only, transcription in background
- Blocking - Full-screen modal, focused workflow (blocks during recording + transcription)
- Tray icon presents status (Idle/Recording/Transcribing)
- Keyboard shortcut (Super+Alt+Space)
- Multi-language support
- Auto text insertion (X11 only) or clipboard
- Customizable models and Voice Activity Detection
- Fast local transcription (no cloud/APIs)
Three components required:
- Extension - GNOME Shell UI, shortcuts, dialogs
- D-Bus Service - Python backend (audio recording, processing)
- whisper.cpp - ggerganov/whisper.cpp server for transcription
All three must be installed separately (see Installation below).
Install extension from extensions.gnome.org, then:
1. Install Dependencies
# Ubuntu/Debian
sudo apt install build-essential cmake python3 pipx ffmpeg python3-dbus python3-gi wl-clipboard xdotool xclip
# Fedora
sudo dnf install gcc gcc-c++ cmake python3 pipx ffmpeg python3-dbus python3-gobject wl-clipboard xdotool xclip2. Install whisper.cpp
# Clone
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
# Build with CUDA support (NVIDIA GPU)
cmake -B build -DGGML_CUDA=1 -DCMAKE_INSTALL_PREFIX=~/.local
cmake --build build -j --config Release
cmake --install build
# Add to shell environment (~/.bashrc or ~/.zshrc) for CLI usage
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$HOME/.local/lib:$LD_LIBRARY_PATH"' >> ~/.bashrc
source ~/.bashrc
# Add to GNOME environment for the service
mkdir -p ~/.config/environment.d
cat >> ~/.config/environment.d/custom-env.conf <<EOF
PATH=$HOME/.local/bin:\$PATH
LD_LIBRARY_PATH=$HOME/.local/lib:\$LD_LIBRARY_PATH
EOF
# Download models
mkdir -p ~/.cache/whisper.cpp
./models/download-ggml-model.sh base ~/.cache/whisper.cpp
./models/download-vad-model.sh silero-v5.1.2 ~/.cache/whisper.cpp
cd ..CPU-only build: Replace first cmake line with:
cmake -B build -DCMAKE_INSTALL_PREFIX=~/.local3. Install Service
pipx install --system-site-packages \
'git+https://github.com/bcelary/gnome-speech2text.git#subdirectory=service'
speech2text-whispercpp-setupRestart GNOME Shell (X11: Alt+F2 → r, Wayland: log out/in)
For developing or contributing:
git clone https://github.com/bcelary/gnome-speech2text.git
cd gnome-speech2text
make install # Installs both service and extensionFollow steps 1-2 above for dependencies and whisper.cpp.
Restart GNOME Shell (X11: Alt+F2 → r, Wayland: log out/in)
Service (optional - create/edit ~/.config/environment.d/custom-env.conf):
# These environment variables must be in ~/.config/environment.d/custom-env.conf
# so GNOME Shell can see them (not ~/.bashrc)
WHISPER_MODEL=small # tiny, base, small, medium, etc.
WHISPER_LANGUAGE=auto # auto, en, es, fr, de, etc.
WHISPER_VAD_MODEL=auto # auto, none, silero-v5.1.2
WHISPER_SERVER_URL=http://localhost:8080
S2T_SERVICE_LOG_LEVEL=info # error, warn, info, debugAfter editing, restart GNOME Shell or log out/in.
Extension (right-click microphone icon → Settings):
- Progress Display - Always (blocks screen) / Focused (blocks recording only) / Normal (brief messages) / Errors only
- Post-Recording Action - Show preview dialog / Auto-type text (X11 only) / Copy to clipboard only / Auto-type and copy (X11 only)
- Keyboard Shortcut - Default: Super+Alt+Space
- Recording Duration - 10 seconds to 15 minutes
Extension Logging (optional - add to ~/.config/environment.d/custom-env.conf):
S2T_LOG_LEVEL=info # error, warn, info, debugExtension preferences for customizing behavior and keyboard shortcuts
- Press
Super+Alt+Space(or click the extension's round circle icon) - Speak
- Press
Super+Alt+Space(or icon) again to stop recording - Obtain the result or Review transcription and Act (if using preview action)
Check installation:
make status
gnome-extensions enable speech2text-whispercpp@bcelary.githubView logs:
./scripts/tail-logs.sh # Extension logs
./scripts/tail-service-logs.sh # Service logsNote: Text insertion requires X11. On Wayland, use clipboard mode.
make help # See all available targetsFor service development, see service/README.md.
make uninstallMIT - see LICENSE
Forked from kavehtehrani/gnome-speech2text
