Light-Whisper

Local & Online Speech-to-Text for Windows

Press a hotkey, speak, release — text appears at your cursor.

Installation

Option A: Installer (recommended)

Download 轻语.Whisper_x.x.x_x64-setup.exe from the Releases page. Run the installer — everything is bundled, no Python or build tools needed. ASR models will be downloaded on first use (~1–1.5 GB).

Note

GPU (optional): NVIDIA GPU with up-to-date driver for CUDA acceleration. No GPU → automatic CPU fallback.

Option B: Build from source

See Quick Start below.

Highlights

One-key dictation
Hold F2 (configurable) to record, release to transcribe & type into the active window.

Three ASR engines
SenseVoice (zh/en/ja/ko/yue) and Faster Whisper (99+ languages) run fully local with CUDA acceleration; GLM-ASR is online — just add an API key, no Python or GPU needed.

AI polish
Optional LLM post-processing: fix homophones, punctuation, filler words; adapts tone to the foreground app. Built-in presets (OpenAI / DeepSeek / Cerebras / SiliconFlow) plus OpenAI-compatible and Anthropic API formats.

Adaptive learning
Learns structured ASR corrections and key terms; manually deleted hot words are blocked from coming back.

Subtitle overlay
Transparent floating window shows live dictation status and assistant output.

Voice assistant
Separate hotkey triggers a floating answer card. Optionally reads selected text, foreground app, and full-screen screenshots as context. Auto-detects model image support. Built-in web search (Exa / Tavily) for real-time information retrieval.

Edit selected text
Select text anywhere, press the hotkey, speak an instruction ("translate to English", "make it formal") to rewrite in-place.

Real-time translation
Set a target language (8 presets + custom) — transcription results are translated before output.

& more
Hold-to-talk or toggle mode · input queue · multilingual UI (en/zh) · auto-update check.

Light-Whisper vs Typeless

Feature	Light-Whisper	Typeless
Pricing	Free & open-source	Free tier (4k words/week); $12–30/mo
Privacy	Fully offline (local engines)	Cloud-based, zero-data-retention
Open source	✅	❌
Platform	Windows	Windows, Mac, iOS, Android, Web
ASR engines	3 switchable (local + online)	Cloud proprietary
Languages	5–99+ (engine dependent)	100+
AI polish	Multi-backend LLM, bring your own key	Built-in
Screen-aware assistant + web search	✅	❌
Subtitle overlay	✅	❌

Engine Comparison

	SenseVoice (default)	Faster Whisper	GLM-ASR (online)
Chinese CER	2.96 % (AISHELL-1)	5.14 %	7.17 %
English WER	3.15 % (LibriSpeech)	1.82 %	—
Languages	5 (zh/en/ja/ko/yue)	99+	Chinese + dialects
Punctuation	Built-in ITN	initial_prompt guided	Built-in
Hot words	✅	✅	✅ (max 100)
Model size	~938 MB	~1.5 GB	Cloud (no download)
Requires	GPU/CPU (bundled)	GPU/CPU (bundled)	API key only
Cost	Free (local)	Free (local)	¥0.06/min

Note

SenseVoice/Whisper CER source: FunAudioLLM paper, Table 6. GLM-ASR CER source: Zhipu AI.

Build from Source — Requirements

Important

Windows 10/11 (x64) only. Disk: ~10 GB free. These requirements are only needed for building from source — the installer bundles everything.

Tool	Version	Purpose
Visual Studio Build Tools	2019+	MSVC C++ toolchain
Rust	>= 1.75	Backend
Node.js	>= 18	Frontend build
pnpm	>= 8	Frontend packages
uv	>= 0.4	Python env (auto-installs Python 3.11)

GPU (optional): NVIDIA GPU with up-to-date driver. No need to install CUDA Toolkit — PyTorch bundles CUDA 12.8.

Tip

GLM-ASR users: If you only use the online GLM-ASR engine, you only need Rust, Node.js, and pnpm — no Python, uv, or GPU required. Just build and add your API key in Settings.

Step-by-step tool installation

# 1. Visual Studio Build Tools — download installer, check "Desktop development with C++"
# 2. Rust
winget install Rustlang.Rustup
# 3. Node.js + pnpm
winget install OpenJS.NodeJS.LTS
npm install -g pnpm
# 4. uv
winget install astral-sh.uv

Verify:

rustc --version     # >= 1.75
node --version      # >= 18
pnpm --version      # >= 8
uv --version        # >= 0.4

[!TIP] Python is managed by uv — no manual install required.

Quick Start

git clone https://github.com/sypsyp97/light-whisper.git
cd light-whisper

pnpm install          # Frontend deps
uv sync               # Python deps (downloads Python 3.11, PyTorch CUDA, etc. ~5-15 min)

Download ASR models (recommended before first run)

# SenseVoice (default, ~938 MB)
uv run python -c "from huggingface_hub import snapshot_download; snapshot_download('FunAudioLLM/SenseVoiceSmall'); snapshot_download('funasr/fsmn-vad')"

# Faster Whisper (~1.5 GB)
uv run python -c "from huggingface_hub import snapshot_download; snapshot_download('deepdml/faster-whisper-large-v3-turbo-ct2')"

Models are cached in ~/.cache/huggingface/hub/.

Tip

China mainland: set $env:HF_ENDPOINT = "https://hf-mirror.com" before downloading.

Build & Run

pnpm tauri build      # First build ~5-15 min (compiles Rust deps)

The installer is in src-tauri/target/release/bundle/nsis/, or run src-tauri/target/release/light-whisper.exe directly.

Architecture

┌──────────────┐                ┌──────────────┐                ┌─────────────────┐
│   React UI   │  Tauri IPC     │  Rust Core   │  stdin/stdout  │   Python ASR    │
│  TypeScript  │◄──invoke/emit─►│  (Tauri 2)   │◄────JSON──────►│  SenseVoice /   │
└──────────────┘                └──────┬───────┘                │  Faster Whisper │
                                       │                        └─────────────────┘
                                       ├─── HTTP ──► GLM-ASR API (online ASR)
                                       ├─── HTTP ──► LLM API (AI polish / assistant / translation)
                                       ├─── HTTP ──► Web Search (Exa / Tavily) → assistant context
                                       ├─── Screen Capture ──► full-screen screenshots → assistant context
                                       └─── User Profile ──► hot words + blacklist → ASR + LLM prompt

Key paths

Layer	Paths
Frontend	`src/pages/`, `src/components/`, `src/hooks/`, `src/contexts/`, `src/lib/`, `src/i18n/`, `src/styles/`
Rust commands	`src-tauri/src/commands/` — audio, assistant, clipboard, funasr, hotkey, ai_polish, profile, updater, window
Rust services	`src-tauri/src/services/` — funasr_service, glm_asr_service, audio_service, assistant_service, ai_polish_service, llm_client, llm_provider, profile_service, screen_capture_service, web_search_service, download_service
State	`src-tauri/src/state/` — app_state, user_profile
Python ASR	`src-tauri/resources/` — funasr_server.py, whisper_server.py, server_common.py

Dev Commands

pnpm tauri dev          # Dev mode with hot-reload
pnpm tauri build        # Production build + installer
pnpm build              # Frontend only
uv sync                 # Sync Python deps
cd src-tauri && cargo check   # Rust type check

FAQ

PyTorch or model downloads are slow

PyTorch CUDA (~2.5 GB) downloads from download.pytorch.org — use a stable connection or VPN. uv sync supports resume.
Other Python packages can use a Tsinghua mirror: $env:UV_INDEX_URL = "https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple"
HuggingFace models: $env:HF_ENDPOINT = "https://hf-mirror.com"

GPU not detected

Verify: .venv\Scripts\python.exe -c "import torch; print(torch.cuda.is_available())" should print True. Requires an up-to-date NVIDIA driver. For CUDA 12.x minor-version compatibility, Windows driver should be >= 528.33. The app falls back to CPU automatically if no GPU is found.

Characters turn into periods when typing into apps

This happens when a Chinese IME intercepts SendInput Unicode events. Fix: switch to Clipboard paste mode in Settings, or toggle your IME to English mode.

Chinese text appears garbled

Enable Windows UTF-8 system locale support to resolve encoding issues:

Control Panel → Region (or Clock and Region) → Administrative → Change system locale... → check Beta: Use Unicode UTF-8 for worldwide language support → OK → restart Windows.

Hotkey not working

Default dictation hotkey is F2. If occupied by another program, change it in Settings (e.g. Ctrl+Win+R).

Log locations

%APPDATA%\com.light-whisper.app\logs\ — funasr_server.log / whisper_server.log

Acknowledgements

FunASR & SenseVoiceSmall — Alibaba DAMO Academy
faster-whisper & large-v3-turbo-ct2
GLM-ASR — Zhipu AI
Tauri / React

License

Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.codex		.codex
assets		assets
graphify-out		graphify-out
scripts		scripts
src-tauri		src-tauri
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.graphifyignore		.graphifyignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
index.html		index.html
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
uv.lock		uv.lock
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Light-Whisper

Installation

Option A: Installer (recommended)

Option B: Build from source

Highlights

Light-Whisper vs Typeless

Engine Comparison

Build from Source — Requirements

Quick Start

Download ASR models (recommended before first run)

Build & Run

Architecture

Dev Commands

FAQ

Acknowledgements

License

About

Uh oh!

Releases 27

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Light-Whisper

Installation

Option A: Installer (recommended)

Option B: Build from source

Highlights

Light-Whisper vs Typeless

Engine Comparison

Build from Source — Requirements

Quick Start

Download ASR models (recommended before first run)

Build & Run

Architecture

Dev Commands

FAQ

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 27

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages