Skip to content

bnema/ghisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ghisper

Real-time speech-to-text combining Go client with SimulStreaming Python backend.

Overview

ghisper splits speech recognition into two parts:

  • Go client: Audio capture, typing automation, system integration
  • Python backend: SimulStreaming ASR engine with AlignAtt policy

Features

  • Real-time transcription streaming
  • Multi-language support (100+ languages via Whisper)
  • Progressive typing via uinputd-go
  • Low latency (~200-500ms)
  • Unix socket IPC for minimal overhead
  • Interactive installation with GPU detection

Architecture

Go Client (ghisper)
  - Audio capture (malgo)
  - Unix socket client
  - Progressive typing (uinputd-go)
        |
        v  Unix Socket
Python Backend (user systemd service)
  - SimulStreaming ASR
  - Whisper (tiny → large-v3-turbo) + AlignAtt
  - HuggingFace model browser

Quick Start

Prerequisites

# System packages
sudo pacman -S python git

# Go 1.23+
# Python 3.10+

Build and Install

# Build Go client
make build

# Install to ~/.local/bin (user-local, no sudo)
make install

# Install Python backend (interactive)
ghisper install backend
# - Detects GPU (NVIDIA/AMD/none)
# - Choose PyTorch variant (CPU/CUDA/ROCm)
# - Select Whisper model (tiny → large-v3-turbo)
# - Creates venv in ~/.local/share/ghisper/venv
# - Generates config at ~/.config/ghisper/config.toml

# Install and start systemd service
ghisper install systemd-service

Usage

# Check system status
ghisper status

# Start recording (press 'r' or Space to toggle)
ghisper record

# Stop all sessions
ghisper stop

# Run health checks
ghisper doctor

Configuration

Config: ~/.config/ghisper/config.toml

[server]
type = "unix"
socket_path = "/tmp/ghisper.sock"

[model]
name = "base"
device = "auto"  # auto, cpu, cuda, rocm

[processing]
language = ""  # auto-detect
task = "transcribe"

[client.audio]
device = "default"
chunk_size_ms = 100

[client.typing]
enabled = true
layout = "us"
progressive = true

[logging]
level = "info"

Development

make build         # Build to bin/ghisper
make install       # Install to ~/.local/bin
make uninstall     # Remove binary and backend
make purge         # Full cleanup (config + models)
make check         # Format, vet, test

Project Structure

ghisper/
├── cmd/ghisper/          # CLI commands
├── internal/
│   ├── audio/            # Audio capture (malgo)
│   ├── client/           # Backend client
│   ├── typer/            # Typing (uinputd-go)
│   ├── config/           # Config management
│   ├── models/           # Model registry
│   ├── protocol/         # IPC protocol
│   ├── installer/        # Installation logic
│   └── doctor/           # Health checks
└── backend/              # Python backend
    ├── server.py         # SimulStreaming server
    ├── config.py         # Config parser
    └── convert_model.py  # HF → Whisper converter

Dependencies

Go

  • github.com/gen2brain/malgo - Audio capture
  • github.com/bnema/uinputd-go - Keyboard typing
  • github.com/spf13/cobra - CLI framework
  • github.com/charmbracelet/* - Terminal UI

Python

  • torch - Deep learning backend
  • openai-whisper - ASR model
  • SimulStreaming - Streaming ASR
  • go-huggingface - Model downloads

License

MIT

References

Releases

No releases published

Packages

 
 
 

Contributors