Turn your voice into text โ privately, on your own computer.
No cloud services. No accounts. No data ever leaves your machine.
Hit record, talk, get text. That's it.
curl -fsSL https://raw.githubusercontent.com/ryan-winkler/captainslog-whisper/main/install.sh | bashThis downloads, builds, and installs everything. It'll guide you through each step.
Prerequisite: You need Go installed first. The installer will tell you if it's missing.
Click to expand manual instructions
Step 1: Start a transcription engine
Captain's Log uses Whisper (an open-source speech recognition model) to turn your voice into text. You need to run it separately:
# If you have Docker installed:
docker run -d -p 5000:5000 ghcr.io/heimoshuiyu/whisper-fastapi:latest
# Have a GPU? Add --gpus all for much faster transcription:
docker run -d -p 5000:5000 --gpus all ghcr.io/heimoshuiyu/whisper-fastapi:latestNo Docker? You can also install it with pip:
pip install faster-whisper fastapi uvicorn uvicorn whisper_fastapi:app --host 0.0.0.0 --port 5000
Step 2: Build Captain's Log
git clone https://github.com/ryan-winkler/captainslog-whisper.git
cd captainslog-whisper
go build -o captainslog ./cmd/captainslogNeed Go? Download from go.dev/dl โ it's one file, no complicated setup.
Step 3: Run it
./captainslogOpen http://localhost:8090 in your browser. Hit the mic button and talk. That's it.
cp captainslog ~/.local/bin/
cp examples/captainslog.service ~/.config/systemd/user/
systemctl --user enable --now captainslogCaptain's Log connects to two optional services. You need the first one; the second is a bonus.
| ๐๏ธ Transcription Engine | ๐ค AI Assistant | |
|---|---|---|
| What | Whisper (speech-to-text) | LLM (text cleanup / summarization) |
| Role | The ears โ turns your voice into text | The brain โ polishes what you said |
| Required | Yes โ core function | Yes โ post-processing built in |
| Examples | whisper-fastapi, faster-whisper, Insanely Fast Whisper | Ollama, LM Studio, any OpenAI-compatible API |
| Configure | Settings โ Connections โ Whisper Server URL | Settings โ Connections โ Enable AI Assistant |
Captain's Log works with any backend that speaks the OpenAI /v1/audio/transcriptions API. Here are your options:
# Standard โ works everywhere
docker run -d -p 5000:5000 ghcr.io/heimoshuiyu/whisper-fastapi:latest
# With NVIDIA GPU โ much faster
docker run -d -p 5000:5000 --gpus all ghcr.io/heimoshuiyu/whisper-fastapi:latest
# Distil-Whisper โ 6x faster, 49% smaller, ~1% WER (English)
docker run -d -p 5000:5000 --gpus all \
-e WHISPER_MODEL=distil-whisper/distil-large-v3 \
ghcr.io/heimoshuiyu/whisper-fastapi:latest| Backend | GPU | Speed | Docker | Notes |
|---|---|---|---|---|
| whisper-fastapi | NVIDIA/CPU | โญโญโญ | โ | Default, battle-tested |
| faster-whisper-server | NVIDIA/CPU | โญโญโญโญ | โ | CTranslate2 optimized |
| Distil-Whisper models | NVIDIA/CPU | โญโญโญโญโญ | โ | 6x faster, use with any backend above |
| Wyoming Faster Whisper ROCm | AMD | โญโญโญโญ | โ | AMD GPU support via ROCm + Wyoming protocol |
| Insanely Fast Whisper | NVIDIA/MPS | โญโญโญโญโญ | โ (CLI) | Flash Attention 2, 150min in <98s |
Distil-Whisper is a distilled version of Whisper โ 6x faster, 49% smaller, within 1% WER. English only (multilingual coming). Select these in Settings โ Model:
| Model | Size | Speed | Quality |
|---|---|---|---|
distil-whisper/distil-large-v3 |
756M | โญโญโญโญโญ | Best distilled |
distil-whisper/distil-large-v2 |
756M | โญโญโญโญโญ | Stable alternative |
distil-whisper/distil-medium.en |
394M | โญโญโญโญ | Good balance |
distil-whisper/distil-small.en |
166M | โญโญโญ | Lowest memory |
If you have an NVIDIA GPU or Apple Silicon, Insanely Fast Whisper offers the fastest transcription available โ 150 minutes of audio in under 98 seconds:
# Install
pipx install insanely-fast-whisper
# Use with Distil-Whisper for maximum speed
insanely-fast-whisper --model-name distil-whisper/distil-large-v3 --file-name audio.wavCaptain's Log doesn't need Whisper running on the same computer. Point it at any machine:
| Setup | Whisper URL |
|---|---|
| Same computer | http://127.0.0.1:5000 (default) |
| Another PC on your network | http://192.168.1.100:5000 |
| A VPS with a GPU | http://gpu-server.example.com:5000 |
| Docker on a NAS | http://nas.local:5000 |
Set the URL in Settings โ Connections โ Whisper Server URL, or:
export CAPTAINSLOG_WHISPER_URL=http://your-server:5000Tip: For remote servers, consider putting a reverse proxy (like Caddy or nginx) in front of Whisper with HTTPS.
| Feature | What it means |
|---|---|
| Record & transcribe | Click the mic, talk, get text |
| File upload | Drag-and-drop audio or video files |
| URL transcription | Paste a YouTube or podcast URL โ yt-dlp downloads and transcribes |
| Batch processing | Drop multiple audio files โ processed sequentially with progress |
| Speaker diarization | Automatic speaker identification with 8 distinct colors |
| Folder watcher | Watch a directory for new audio files โ auto-transcribes and saves |
| Feature | What it means |
|---|---|
| Subtitle editor | Split, insert, delete segments, edit timecodes, CPS warnings, undo (Ctrl+Z) |
| Follow-along highlight | Active segment highlights and auto-scrolls during playback |
| Playback speed | 0.5รโ2ร speed control in the editor |
| Notes | Per-entry notes โ persisted, shown via ๐ indicator |
| Feature | What it means |
|---|---|
| Copy to clipboard | Transcribed text is automatically copied |
| Save to PKM | Auto-save to Obsidian, Logseq, or any folder |
| Export | .txt, .md, .srt, .vtt, .json, .lrc โ from main UI or editor |
| Search history | Instantly filter past transcriptions |
| Pin entries | Star important transcriptions to keep them at the top |
| Feature | What it means |
|---|---|
| Send to AI | Post-process with Ollama, LM Studio, or any local LLM |
| Keyboard shortcuts | Press ? for the full overlay |
| Mini mode | Compact widget โ press M or add ?mini to the URL |
| Live streaming | Real-time transcription via WebSocket (experimental) |
| Private | Everything stays on your machine |
Click โ๏ธ in the top-right corner to open Preferences. Everything is saved automatically.
| Setting | What it does |
|---|---|
| Save directory | Where transcriptions are saved as markdown โ works with Obsidian, Logseq, or any folder. Click ๐ to open. |
| Download directory | Where exported files are downloaded. Click ๐ to open. |
| Recordings directory | Where audio recordings are stored (read-only). Click ๐ to open. |
| Watch directory | Monitor a folder for new audio files โ auto-transcribes and saves to vault. Leave empty to disable. |
| Date format | How dates appear in file names (ISO, EU, US, with day, named) |
| File title | Prefix for saved markdown files (default: "Dictation") |
| Setting | What it does |
|---|---|
| Default language | What language you're speaking โ English, auto-detect, Gaeilge, Portuguรชs, Espaรฑol, Franรงais, Deutsch |
| Initial prompt | Guide the model with names, jargon, or context (e.g. "Meeting about warp core calibration") |
| Whisper model | Model size โ large-v3 (best), medium (balanced), small (fast), base, tiny |
| Skip silence (VAD) | Automatically skip quiet parts to speed up processing |
| Speaker labels | Tag who said what (requires WhisperX or diarization-capable backend) |
URL Transcription: Requires yt-dlp installed on the system. Paste a YouTube, podcast, or any supported URL in the input field.
| Setting | What it does |
|---|---|
| Auto-copy | Automatically copy transcribed text to your clipboard |
| Auto-save | Automatically save every transcription to your save directory |
| Show stardates | Fun Star Trek stardate display (toggle on/off) |
| Time format | 12-hour (AM/PM), 24-hour, or system default |
| History limit | How many recent transcriptions to show (default: 5, 0 = unlimited) |
| Default export format | Format used by the Export button โ Text, Markdown, SRT, or WebVTT |
| Setting | What it does |
|---|---|
| Port | HTTP port Captain's Log listens on (read-only, change requires restart) |
| Whisper backend URL | Where the speech-to-text server is running (default: http://127.0.0.1:5000) |
| Enable TLS | Auto-generate self-signed certs for HTTPS/LAN access |
| Enable Local LLM | Connect to a local AI for post-processing โ summarize, rewrite, extract action items |
| Local LLM URL | Where your AI is running (see AI Post-Processing below) |
| Local LLM Model | Which AI model to use โ pick from those installed in Ollama/LM Studio |
| Setting | What it does |
|---|---|
| Access logging | Log every HTTP request to stdout in JSON (off by default) |
| Key | Action |
|---|---|
Space |
Start/stop recording |
Ctrl+C |
Copy transcription |
Ctrl+S |
Save transcription |
M |
Toggle mini/full mode |
, |
Open preferences |
Escape |
Close preferences |
You can pass options when starting Captain's Log to override the defaults. These override environment variables and settings.
# Start on a different port
captainslog --port 9090
# Point to a remote Whisper server
captainslog --whisper-url http://gpu-server:5000
# Set your Obsidian vault path
captainslog --vault ~/Documents/Vault/Dictation
# Check your version
captainslog --version| Flag | What it does | Default |
|---|---|---|
--port |
Server port | 8090 |
--host |
Bind address | 0.0.0.0 |
--whisper-url |
Whisper backend URL | http://127.0.0.1:5000 |
--llm-url |
LLM server URL | http://127.0.0.1:11434 |
--vault |
Save directory for autosave | (empty) |
--history-limit |
Max history entries shown | 5 |
--enable-llm |
Enable local LLM integration | false |
--enable-tls |
Enable auto-TLS for HTTPS | false |
--stream-url |
WebSocket URL for live streaming | (empty) |
--version |
Print version and exit | โ |
Terminal tip: Captain's Log works great in Ghostty, Kitty, Alacritty, or any terminal. Just run
captainslogfrom your shell (zsh, bash, fish).
A compact, minimal view showing only the record button and latest transcription. Great for a small window or sidebar.
- Keyboard: Press
Mto toggle between mini and full mode - URL: Add
?minito the address:http://localhost:8090/?mini - Automatic: The interface automatically switches to mini layout on narrow screens (under 520px)
Tip: Drag the browser window to your preferred size and bookmark it. Mini mode is perfect as a floating widget while you work.
Captain's Log can send your transcriptions to a local AI running on your own computer for cleanup, formatting, or summarization. Think of it like having a personal editor that fixes typos and adds punctuation โ but it runs entirely on your machine, so nothing leaves your network.
- You record something and get a transcription
- You click the ๐ค Send to AI button below your transcription
- Captain's Log proxies the text through its own backend to your local AI (Ollama/LM Studio), avoiding browser CORS restrictions
- The AI response appears directly below your transcription
Why a proxy? Browsers block direct requests to
localhost:11434due to CORS. Captain's Log handles this by routing LLM requests through/api/llm/chaton the Go backend โ same machine, no CORS, no data ever leaves your network.
You need an AI running locally. Two popular options:
| App | What it is | URL to use in Preferences |
|---|---|---|
| Ollama | Command-line AI runner | http://127.0.0.1:11434 |
| LM Studio | Desktop app with a nice UI | http://127.0.0.1:1234/v1 |
Steps:
- Install Ollama or LM Studio and download a model (e.g.
llama3.2) - Open Captain's Log โ โ๏ธ Preferences โ Connections
- Turn on "Enable Local LLM"
- Paste the URL from the table above into "Local LLM URL"
- Click the โป refresh button next to the model dropdown โ your models will appear
- Pick a model and hit "Save preferences"
That's it. The ๐ค Send to AI button will now appear in the action bar.
Any OpenAI-compatible API works โ if your AI app has a
/v1/chat/completionsendpoint, Captain's Log can talk to it. The full request body (model, messages, temperature, etc.) is forwarded transparently.
Captain's Log works with OpenClaw, ZeroClaw, and any agent that can run shell commands or HTTP requests. The LLM proxy at /api/llm/chat accepts the standard OpenAI chat completions format, so agents can post-process transcriptions without CORS issues.
# Agent can transcribe + post-process in one pipeline:
captainslog-cli transcribe meeting.mp3 | \
curl -s -X POST http://localhost:8090/api/llm/chat \
-H 'Content-Type: application/json' \
-d '{"model":"llama3.2","messages":[{"role":"user","content":"Summarize: ..."}]}'The transcription engine isn't running. Start it:
docker run -d -p 5000:5000 ghcr.io/heimoshuiyu/whisper-fastapi:latestOr if it's on a remote machine, check that the URL in Preferences โ Whisper server URL is correct.
- Your browser needs permission to use the microphone
- On HTTP, this only works on
localhostโ if accessing from another device, enable TLS - Check the audio source dropdown (next to the mic button) โ make sure the right input is selected
Transcription speed depends on three things: model size, hardware, and settings.
| Setting | Recommendation | Why |
|---|---|---|
| Beam size | 5 (default) |
Higher โ better. 10 is 2ร slower with negligible quality gain. |
| Skip silence (VAD) | Enable | Skips quiet parts โ huge speedup for recordings with pauses |
| Speaker labels (diarize) | Disable when not needed | Expensive post-processing pass |
| Word timestamps | Disable when not needed | Extra alignment pass after transcription |
| Model | large-v3-turbo |
8ร faster than large-v3 with minimal quality loss |
| Model | Speed | Quality | Best for |
|---|---|---|---|
large-v3 |
Slowest | Best | Final/archival transcriptions |
large-v3-turbo |
8ร faster | Near-v3 | โญ Recommended default |
distil-large-v3 |
2ร faster | Good | English-only use |
medium |
4ร faster | Good | Low-end hardware |
small |
8ร faster | OK | Quick drafts |
base / tiny |
Instant | Basic | Real-time / low-power |
If you have an AMD GPU (like the RX 7900 XTX), faster-whisper / CTranslate2 cannot use it โ they only support NVIDIA CUDA. However, whisper.cpp supports AMD GPUs via ROCm:
# Build whisper.cpp with ROCm (HIPBLAS) support
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build -DGGML_HIPBLAS=1
cmake --build build --config Release -j
# Download a model
./build/bin/whisper-cli --download-model large-v3-turbo
# Run the server (OpenAI-compatible API)
./build/bin/whisper-server \
--model models/ggml-large-v3-turbo.bin \
--host 0.0.0.0 --port 5000Expected speedup: ~7ร faster than CPU. The RX 7900 XTX (24GB VRAM) handles
large-v3comfortably.
Then point Captain's Log at it โ same URL, same API:
Preferences โ Connections โ Whisper backend URL โ http://127.0.0.1:5000
# Docker (fastest setup)
docker run -d -p 5000:5000 --gpus all ghcr.io/heimoshuiyu/whisper-fastapi:latest
# Or pip
pip install faster-whisper
# faster-whisper automatically uses CUDA when available# Quick health check
curl http://localhost:8090/healthz?diag | python3 -m json.tool
# Or from the CLI
captainslog-cli diagnoseClick to expand technical details
Browser / CLI / API
โ
Captain's Log (Go, port 8090)
โ
faster-whisper (port 5000)
โ
Transcription
โ
Clipboard ยท Vault ยท AI forwarding
Captain's Log is a ~10MB static Go binary with zero external dependencies. It proxies audio to faster-whisper via whisper-fastapi and provides a browser UI, CLI, and OpenAI-compatible API.
The CLI (captainslog-cli) is a pure bash + curl wrapper around the HTTP API. No external dependencies.
captainslog-cli transcribe meeting.mp3 # Transcribe a file โ stdout
captainslog-cli transcribe call.wav srt # Transcribe as SRT (with timestamps)
captainslog-cli copy "text to clipboard" # Copy to system clipboard (wl-copy/xclip/pbcopy)
captainslog-cli save "Notes from meeting" # Save text to vault as markdown
captainslog-cli stardate # Print current stardate
captainslog-cli health # Check backend status
captainslog-cli models # List available Whisper/LLM models
captainslog-cli diagnose # Full self-diagnostic (server, whisper, vault, clipboard)
captainslog-cli settings # Show all settings as JSON
captainslog-cli settings language fr # Update a single setting
captainslog-cli update # Pull latest + rebuild + install
captainslog-cli version # Print server versionPipe-friendly:
captainslog-cli transcribe audio.mp3 | captainslog-cli copy # Transcribe โ clipboard
captainslog-cli transcribe audio.mp3 | captainslog-cli save # Transcribe โ vault
echo "dictated text" | captainslog-cli save # Pipe text โ vault| Endpoint | Method | Description |
|---|---|---|
/v1/audio/transcriptions |
POST |
OpenAI-compatible (multipart). JSON responses are enriched with SRT-parsed segments for real timestamps. |
/v1/audio/translations |
POST |
Translate audio to English |
/api/llm/chat |
POST |
LLM proxy โ forwards OpenAI chat completions to Ollama/LM Studio (avoids CORS) |
/api/settings |
GET/PUT |
Persistent settings (merged on PUT, full replace not required) |
/api/vault/save |
POST |
Save text to vault as markdown ({"text":"...","language":"en"}) |
/api/recordings |
POST |
Save audio recording (multipart) |
/api/open |
POST |
Open file/folder in system file manager (?path=...) |
/api/models |
GET |
Available Whisper + LLM models |
/api/config |
GET |
Read-only runtime config (vault, llm, auth, tls status) |
/api/stardate |
GET |
Current stardate |
/healthz |
GET |
Health check (add ?diag for detailed diagnostics) |
For server-level config (systemd, Docker). Most users won't need these.
| Variable | Default | Description |
|---|---|---|
CAPTAINSLOG_PORT |
8090 |
HTTP port |
CAPTAINSLOG_HOST |
0.0.0.0 |
Bind address |
CAPTAINSLOG_WHISPER_URL |
http://127.0.0.1:5000 |
Whisper backend URL |
CAPTAINSLOG_LLM_URL |
http://127.0.0.1:11434 |
Local LLM URL (Ollama, LM Studio, etc.) |
CAPTAINSLOG_ENABLE_LLM |
false |
Enable local LLM integration |
CAPTAINSLOG_AUTH_TOKEN |
(empty) | Bearer token for auth |
CAPTAINSLOG_VAULT_DIR |
(empty) | Obsidian vault path |
CAPTAINSLOG_CONFIG_DIR |
~/.config/captainslog |
Settings location |
CAPTAINSLOG_ENABLE_TLS |
false |
Auto-generate TLS cert |
CAPTAINSLOG_RATE_LIMIT |
0 |
Requests/minute (0 = disabled, set >0 for LAN/public) |
CAPTAINSLOG_HISTORY_LIMIT |
5 |
Max history entries shown |
CAPTAINSLOG_STREAM_URL |
(empty) | WebSocket URL for live streaming (e.g. ws://localhost:8765) |
CAPTAINSLOG_LOG_FORMAT |
text |
Log format (text or json) |
CAPTAINSLOG_LOG_DIR |
(empty) | Log file directory (auto-rotated, stdout always active) |
Migrating from older versions?
CAPTAINSLOG_OLLAMA_URLandCAPTAINSLOG_ENABLE_OLLAMAstill work โ they're automatically mapped to the new names.
Captain's Log can stream transcription live while you record, showing partial text in real time. This requires a separate streaming backend like WhisperLiveKit or whisper_streaming.
# Start a streaming backend (pick one)
pip install whisper-live-kit
whisper-live-kit --host 0.0.0.0 --port 8765
# Point Captain's Log to it
captainslog --stream-url ws://localhost:8765
# or: export CAPTAINSLOG_STREAM_URL=ws://localhost:8765When a streaming URL is configured:
- A LIVE badge appears during recording
- Partial text appears in the Latest card as you speak
- On stop, final transcription is still saved to history + vault as normal
- If the streaming backend is unavailable, Captain's Log falls back to post-recording transcription automatically
docker build -t captainslog .
docker run -p 8090:8090 \
-e CAPTAINSLOG_WHISPER_URL=http://host.docker.internal:5000 \
captainslogCaptain's Log works with OpenClaw, ZeroClaw, and any agent that can run shell commands or HTTP requests. See the OpenClaw skill or add to your agent's config:
## Captain's Log (Speech-to-Text)
- CLI: `captainslog-cli transcribe <file>`, `captainslog-cli save "text"`
- API: POST http://127.0.0.1:8090/v1/audio/transcriptions (multipart, OpenAI-compatible)
- Health: GET http://127.0.0.1:8090/healthz| Integration | How |
|---|---|
| faster-whisper | Transcription engine (4x faster than OpenAI Whisper) |
| Ollama | Post-processing: punctuation, summaries, filler removal |
| Obsidian | Daily markdown files with YAML frontmatter |
| Home Assistant | Local voice via Wyoming protocol |
- Fork โ Branch โ Commit โ PR
go run ./cmd/captainslogfor dev servergo test ./...to run tests
Code style: Go stdlib only ยท vanilla HTML/CSS/JS ยท no external JS dependencies
Captain's Log shows TNG-era stardates in the header (toggle in Preferences โ Show stardates).
How it works: -(year offset ร 1000) + day_fraction. Negative stardates = we're still in the "past" relative to TNG.
Quirks:
- Stardates are aesthetic, not canon-accurate
- The waveform shows frequency bars (spectrum analyser), not raw audio
- History is in your browser's localStorage โ clearing browser data removes it, but vault files on disk are safe
Captain's Log needs microphone access from both your browser and your operating system. If recording doesn't work, check both levels:
When you first click Record, your browser will ask for microphone permission. Click Allow.
If you accidentally blocked it:
- Chrome/Edge: Click the ๐ icon in the address bar โ Site settings โ Microphone โ Allow
- Firefox: Click the ๐ icon โ Connection secure โ More information โ Permissions โ Microphone
- Safari: Safari โ Settings โ Websites โ Microphone โ Allow for this site
HTTPS note: Chrome blocks microphone access on non-HTTPS sites (except
localhost). If accessing Captain's Log over your LAN, either:
- Set
CAPTAINSLOG_ENABLE_TLS=true(auto-generates a self-signed cert)- Add
http://your-server:8090tochrome://flags/#unsafely-treat-insecure-origin-as-secure
System Settings โ Privacy & Security โ Microphone โ enable your browser (Chrome, Firefox, Safari, etc.)
If your browser isn't listed, open it and try recording โ macOS will prompt you.
Settings โ Privacy & Security โ Microphone โ ensure:
- "Microphone access" is On
- "Let apps access your microphone" is On
- Your browser is listed and enabled under "Let desktop apps access your microphone"
Most Linux desktops use PipeWire (or PulseAudio). Check:
- GNOME: Settings โ Sound โ Input โ verify your mic is selected
- KDE Plasma: System Settings โ Audio โ Input Devices โ verify your mic is active
- PipeWire/PulseAudio:
pactl list sources shortto see available inputs - Flatpak browsers: May need portal permissions โ run
flatpak permissionsor reinstall with--device=all
Settings โ Apps โ [your browser] โ Permissions โ Microphone โ Allow
Chrome on Android will also prompt when you first try to record.
Settings โ [your browser] โ Microphone โ On
Safari requires HTTPS for microphone access on iOS (localhost doesn't bypass this on mobile).
Settings โ Privacy and security โ Site settings โ Microphone โ ensure Captain's Log is allowed.
| Problem | Solution |
|---|---|
| "Microphone access required" | Check browser + OS permissions (see Privacy & Permissions) |
| "No microphone detected" | Verify audio input device is connected and selected in OS sound settings |
| Recording works but transcription fails | Check Whisper is running: curl http://localhost:5000/v1/models |
| "Backend unreachable" | Start the Whisper server: docker run -d -p 5000:5000 ghcr.io/heimoshuiyu/whisper-fastapi:latest |
| Slow transcription | Enable VAD, reduce beam_size to 5, try large-v3-turbo. AMD GPU? Use whisper.cpp with ROCm. See full guide. |
| Rate limit error | Disabled by default. If enabled, set CAPTAINSLOG_RATE_LIMIT=0 to disable |
| PWA won't install | Must be on HTTPS or localhost. Try Chrome โ โฎ โ Install app |
| Ctrl+S saves HTML page | Update to latest version โ this was fixed in v0.2.0 |
| Can't hear playback | Audio recordings are saved server-side. Click ๐ on a history entry to play. |
| Settings not saving | Check browser console (F12) for errors. Settings require the backend to be running. |
Still stuck? Open an issue on GitHub with:
- Your OS and browser
- The output of
curl http://localhost:8090/healthz?diag- Any errors from the browser console (F12 โ Console)
- All audio processed locally โ never leaves your machine
- No telemetry, analytics, or tracking โ zero external requests
- No accounts or sign-up โ just run the binary
- Optional auth token (
CAPTAINSLOG_AUTH_TOKEN) for LAN/remote access - Optional auto-TLS (
CAPTAINSLOG_ENABLE_TLS) generates a self-signed cert - Rate limiting available for public-facing deployments (
CAPTAINSLOG_RATE_LIMIT) - XSS-safe โ all user content is HTML-escaped before rendering
- Content from external APIs (Whisper responses) is sanitized before display
Captain's Log uses standard APIs (OpenAI-compatible /v1/audio/transcriptions) and can work alongside many tools in the open-source speech ecosystem.
| Backend | Best For | Setup |
|---|---|---|
| whisper.cpp | AMD GPU (ROCm), low-memory, C++ | Build with -DGGML_HIPBLAS=1, run whisper-server |
| CrisperWhisper | Precise word-level timestamps, filler detection, #1 OpenASR | nyrahealth/faster_CrisperWhisper model with faster-whisper |
| faster-whisper-server | NVIDIA GPU, fast inference | pip install faster-whisper-server && uvicorn ... |
| whisper-fastapi | Home Assistant / webhook integration | Docker: ghcr.io/heimoshuiyu/whisper-fastapi |
| NVIDIA Parakeet | Fastest English-only, NVIDIA GPU | Parakeet TDT 0.6B via NeMo โ 3ร faster than large-v3 |
| SYSTRAN/faster-whisper | Library for custom backends | Python library โ build your own API around it |
| Project | What it does |
|---|---|
| whisper_streaming | Real-time WebSocket transcription. Set --stream-url ws://localhost:43007 |
| OBS Live Translation | Capture OBS audio โ live captions. Route OBS output to Captain's Log via virtual mic |
| Project | What it does |
|---|---|
| OpenLRC | Generate LRC lyrics files from audio. Use Captain's Log SRT export as input |
| open-dubbing | Dub audio into other languages. Feed Captain's Log SRT output as source subtitles |
Captain's Log transcriptions can be piped to any CLI tool:
# Send latest transcription to Gemini CLI
curl -s http://localhost:8090/api/settings | jq -r '.vault_dir' | \
xargs -I{} cat "{}/$(date +%Y-%m-%d).md" | \
gemini "Summarize the key points from these dictation notes"
# Or pipe directly to any LLM CLI
echo "Clean up this transcription:" | cat - /path/to/dictation.md | ollama run llama3.2- Home Assistant: Use whisper-fastapi as the Whisper backend, which has native HA integration
- ZeroClaw: Point ZeroClaw's speech input at
http://localhost:8090/v1/audio/transcriptionsโ it's the same OpenAI-compatible API
Free for individuals and small businesses (under 100 employees / $1M revenue). Larger organizations need a commercial license โ contact ryanw.eu.
๐ Star this repo to get notified about new releases, bug fixes, and features.
Live long and transcribe. ๐
