This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Prerequisites:
Core Development:
# Install dependencies
bun install
# Run in development mode
bun run tauri dev
# If cmake error on macOS:
CMAKE_POLICY_VERSION_MINIMUM=3.5 bun run tauri dev
# Build for production
bun run tauri build
# Frontend only development
bun run dev # Start Vite dev server
bun run build # Build frontend (TypeScript + Vite)
bun run preview # Preview built frontendModel Setup (Required for Development):
# Create models directory
mkdir -p src-tauri/resources/models
# Download required VAD model
curl -o src-tauri/resources/models/silero_vad_v4.onnx https://blob.handy.computer/silero_vad_v4.onnxHandy is a cross-platform desktop speech-to-text application built with Tauri (Rust backend + React/TypeScript frontend).
Backend (Rust - src-tauri/src/):
lib.rs- Main application entry point with Tauri setup, tray menu, and managersmanagers/- Core business logic managers:audio.rs- Audio recording and device managementmodel.rs- Whisper model downloading and managementtranscription.rs- Speech-to-text processing pipeline
audio_toolkit/- Low-level audio processing:audio/- Device enumeration, recording, resamplingvad/- Voice Activity Detection using Silero VAD
commands/- Tauri command handlers for frontend communicationshortcut.rs- Global keyboard shortcut handlingsettings.rs- Application settings management
Frontend (React/TypeScript - src/):
App.tsx- Main application component with onboarding flowcomponents/settings/- Settings UI componentscomponents/model-selector/- Model management interfacehooks/- React hooks for settings and model managementlib/types.ts- Shared TypeScript type definitions
Manager Pattern: Core functionality is organized into managers (Audio, Model, Transcription) that are initialized at startup and managed by Tauri's state system.
Command-Event Architecture: Frontend communicates with backend via Tauri commands, backend sends updates via events.
Pipeline Processing: Audio → VAD → Whisper → Text output with configurable components at each stage.
Core Libraries:
whisper-rs- Local Whisper inference with GPU accelerationcpal- Cross-platform audio I/Ovad-rs- Voice Activity Detectionrdev- Global keyboard shortcutsrubato- Audio resamplingrodio- Audio playback for feedback sounds
Platform-Specific Features:
- macOS: Metal acceleration for Whisper, accessibility permissions
- Windows: Vulkan acceleration, code signing
- Linux: OpenBLAS + Vulkan acceleration
- Initialization: App starts minimized to tray, loads settings, initializes managers
- Model Setup: First-run downloads preferred Whisper model (Small/Medium/Turbo/Large)
- Recording: Global shortcut triggers audio recording with VAD filtering
- Processing: Audio sent to Whisper model for transcription
- Output: Text pasted to active application via system clipboard
Settings are stored using Tauri's store plugin with reactive updates:
- Keyboard shortcuts (configurable, supports push-to-talk)
- Audio devices (microphone/output selection)
- Model preferences (Small/Medium/Turbo/Large Whisper variants)
- Audio feedback and translation options
The app enforces single instance behavior - launching when already running brings the settings window to front rather than creating a new process.