Skip to content

kastheco/kASR

Repository files navigation

asr-gateway

Multi-instance ASR gateway for eaRS clients. Runs separate FastAPI WebSocket servers per use-case so you can start/stop them to free VRAM on demand.

Profiles and ports

Gateway endpoints (proxy mode):

  • Journal: :8770 (routes to EARS_SERVER_URL_JOURNAL)
  • Technical: :8771 (routes to EARS_SERVER_URL_TECHNICAL)
  • Accuracy: :8772 (optional whisper.cpp or OpenAI backend)

Connect from eaRS:

  • ears-dictation --server ws://<tailscale-ip>:8770/ws
  • ears-dictation --server ws://<tailscale-ip>:8771/ws
  • ears-dictation --server ws://<tailscale-ip>:8772/ws

Project Structure

This repository is a monorepo combining:

  • ASR Gateway (root) — Python FastAPI WebSocket proxy for dictation profiles
  • eaRS (ears/) — Rust speech-to-text engine (git subtree from eaRS)

Gateway documentation is in this README and docs/. For eaRS documentation, see ears/CLAUDE.md.

ears subtree operations

See git-subtree-ears.md for pull/push commands.

System overview

          +-------------------+             +---------------------+
Mic PCM ->| eaRS client       |--WebSocket->| asr-gateway          |
          | (ears-dictation)  |             | FastAPI /ws          |
          +-------------------+             |  - journal (8770)    |
                                            |  - technical (8771)  |
                                            |  - accuracy (8772)   |
                                            +----------+----------+
                                                       |
                              +------------------------+------------------------+
                              |                         |                        |
                              v                         v                        v
                       eaRS server (Parakeet)   eaRS server (Parakeet)   whisper.cpp (Vulkan)
                              |                         |                        |
                              +----------- word/final events back to client -----+

Quick start (uv)

uv init
uv add "fastapi[standard]" "uvicorn[standard]" onnxruntime

Environment setup

Copy .env.example to .env and fill in your paths/URLs. The Justfile will load it automatically.

The gateway proxies to upstream eaRS servers defined by:

  • EARS_SERVER_URL_JOURNAL
  • EARS_SERVER_URL_TECHNICAL

Accuracy backend selection:

  • Local whisper.cpp: set ASR_BACKEND=whisper_vulkan
  • OpenAI cloud: set ASR_BACKEND=openai_transcribe and OPENAI_API_KEY

Environment variables

  • EARS_SERVER_URL_JOURNAL: upstream eaRS WebSocket URL for journal mode
  • EARS_SERVER_URL_TECHNICAL: upstream eaRS WebSocket URL for technical mode
  • EARS_ACCURACY_URL: whisper accuracy WebSocket URL (batch on stop)
  • EARS_ACCURACY_PROFILE: profile that triggers accuracy pass (journal|technical)
  • EARS_ACCURACY_ENABLED: enable/disable accuracy at runtime (true|false)
  • EARS_PREVIEW_AUTOCOMMIT_SECS: preview auto-commit timeout (0 disables)
  • ACCURACY_BACKEND: whisper_vulkan or openai_transcribe for the accuracy server
  • ACCURACY_MODEL_JOURNAL: OpenAI model for journal accuracy pass (default gpt-4o-transcribe)
  • ACCURACY_MODEL_TECHNICAL: OpenAI model for technical accuracy pass (default gpt-4o-transcribe)
  • WHISPER_BIN: path to whisper-cli (default whisper-cli)
  • WHISPER_MODEL_PATH: path to ggml model for whisper.cpp
  • WHISPER_SAMPLE_RATE: output sample rate for whisper.cpp (default 16000)
  • OPENAI_API_KEY: API key for OpenAI accuracy backend
  • OPENAI_BASE_URL: base URL for OpenAI API (default https://api.openai.com/v1)
  • OPENAI_TRANSCRIBE_MODEL: accuracy model (default gpt-4o-transcribe)
  • OPENAI_TRANSCRIBE_LANGUAGE: optional ISO language code
  • OPENAI_TRANSCRIBE_PROMPT: optional prompt prefix for transcription
  • OPENAI_TRANSCRIBE_TEMPERATURE: optional temperature (default 0)
  • OPENAI_TRANSCRIBE_TIMEOUT: request timeout seconds (default 30)

Justfile commands

Dictation control (single CLI):

just run-journal
just run-technical
just run-toggle
just run-shutdown

Dictation controls

  • Use scripts/toggle-dictation-profile.sh with --preview to show the overlay buffer and paste on toggle-off.
  • Accuracy can be forced per run with --accuracy-on/--accuracy-off or toggled persistently with --accuracy-toggle.
  • If you bind SIGRTMIN+1 (e.g. KP_Multiply), it toggles accuracy while the preview overlay is visible.

Start/stop to free VRAM:

just journal-start
just journal-stop
just technical-start
just technical-stop
just accuracy-start
just accuracy-stop
just stop-all

Swarm orchestration (spec-kitty)

Multi-agent work package orchestration via just swarm. Each spec-kitty feature gets a dedicated branch, isolated worktrees per WP, and automated wave scheduling.

Feature lifecycle

# 1. Spec, plan, and generate tasks (on master)
spec-kitty specify    # define user scenarios
spec-kitty plan       # design architecture
spec-kitty tasks      # break into work packages

# 2. Start the swarm (auto-creates feature branch from master)
just swarm 005        # creates branch 005-my-feature, sets up WP worktrees

# 3. Monitor and advance
just swarm 005 --status              # show board and wave plan
just swarm 005 --review WP01         # submit WP for review
just swarm 005 --done WP01           # mark WP done after review passes
just swarm 005 --reject WP02 --reason "Fix validation"  # send back with feedback

# 4. Acceptance
just swarm 005 --accept              # validate all WPs done, run readiness check
just swarm 005 --accept --accept-mode pr  # force PR mode

# 5. Merge
git push origin 005-my-feature       # push feature branch
# open PR against master, then merge

What happens automatically

  • Feature branch created from master on first just swarm run
  • meta.json target_branch updated to the feature branch
  • WP worktrees created under .worktrees/<feature>-<WP>/
  • Lane-tracking commits accumulate on the feature branch, not master
  • If you run --accept on master, it warns and falls back to local mode

Swarm commands

  • just swarm <prefix> — start next wave (creates feature branch if needed)
  • just swarm <prefix> --status — show kanban board and computed waves
  • just swarm <prefix> --review <WP> — submit WP for code review
  • just swarm <prefix> --done <WP> — mark WP as done
  • just swarm <prefix> --reject <WP> --reason "..." — reject with feedback
  • just swarm <prefix> --accept — run acceptance validation
  • just swarm <prefix> --dry-run — preview without mutations
  • just swarm <prefix> --cleanup — clean orphan contexts before starting

Feature specs live in kitty-specs/<NNN>-<slug>/. See .kittify/memory/feature-branch-workflow.md for full workflow details.

Smoke test

just sample-audio

scripts/smoke_test_ws.py \
  --url ws://127.0.0.1:8770/ws \
  --wav /tmp/jfk_24k.wav \
  --chunk-ms 200 \
  --sleep

Proxy smoke test (gateway -> eaRS):

EARS_SERVER_URL_JOURNAL=ws://127.0.0.1:8765 just journal-start
scripts/smoke_test_ws.py --url ws://127.0.0.1:8770/ws --wav /path/to/audio.wav --chunk-ms 200 --sleep

Whisper Vulkan setup (AMD GPU)

Install whisper-cli from whisper.cpp and set WHISPER_BIN to the binary path. Build with Vulkan support:

git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
cmake -B build -DGGML_VULKAN=1
cmake --build build -j --config Release

Download a model and run whisper-cli:

sh ./models/download-ggml-model.sh large-v3
./build/bin/whisper-cli -m models/ggml-large-v3.bin -f /path/to/audio.wav

Note: whisper-cli expects 16-bit WAV input. You may need to resample to 16 kHz for best results. The whisper.cpp repo is not part of this monorepo — build it separately and reference the binary via WHISPER_BIN.

WebSocket protocol

The server expects raw 24kHz 16-bit mono PCM bytes over WebSocket. Send a JSON message to end the stream:

{"type":"stop"}

Notes

  • Parakeet inference lives in eaRS; this gateway is used as a profile router.
  • Whisper accuracy server uses whisper.cpp via a subprocess and processes audio on stop (batch).
  • OpenAI accuracy backend uses gpt-4o-transcribe (or whisper-1) via API and processes audio on stop (batch).
  • The gateway resamples to WHISPER_SAMPLE_RATE for whisper.cpp.

Dictation tuning notes

  • Default usage: journal for general dictation; technical for speaking to AI agents in CLI.
  • Journal default correction model: qwen2.5:7b (more literal output).
  • Technical default correction model: qwen2.5-coder:14b (code-aware cleanup).
  • Preview + accuracy shows a review menu (RAW/FINAL/ACCURACY/CANCEL) with ↑/↓ and →/Enter to paste.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors