GitHub - Thinline-Dynamic-Solutions/Whisper

OpenedAI Whisper

An OpenAI API compatible speech to text server for audio transcription and translations, aka. Whisper.

Compatible with the OpenAI audio/transcriptions and audio/translations API
Does not connect to the OpenAI API and does not require an OpenAI API Key
Not affiliated with OpenAI in any way
NEW: Automatic alert tone and silence skipping with Silero VAD

Quick Links

📚 Documentation:

Quick Start Guide - Get running in 5 minutes
Installation Guide - Detailed setup for all platforms
Dependencies Explained - What gets installed and why
Change Log - What's new in v0.2.0

🚀 Quick Start:

pip install -r requirements.txt
python whisper_server.py --model small
curl http://localhost:8000/health

API Compatibility:

/v1/audio/transcriptions
/v1/audio/translations

Parameter Support:

Details:

CUDA or CPU support (automatically detected)
float32, float16 or bfloat16 support (automatically detected)
Silero VAD tone skipping - Automatically detects and skips alert tones and silence at the beginning of audio files using Silero Voice Activity Detection

Tested whisper models:

large-v3 (the default)
large-v2
large
medium
small
base
tiny

Version: 0.2.0, Last update: 2026-01-11

API Documentation

Usage

Installation instructions

System Requirements

Python 3.8 or higher
FFmpeg (for audio processing)
(Optional) CUDA-capable GPU for faster transcription

Installation Steps

Install FFmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS (using Homebrew)
brew install ffmpeg

# Windows (using Chocolatey)
choco install ffmpeg

Install Python Dependencies
```
pip install -r requirements.txt
```
This will install:
- FastAPI and Uvicorn (API server)
- OpenAI Whisper (transcription engine)
- PyTorch and torchaudio (deep learning framework)
- Silero VAD (automatic tone/silence detection, downloaded on first run)
- Python-multipart (file upload support)
(Optional) CUDA Support

For GPU acceleration, install CUDA for your operating system. PyTorch will automatically detect and use CUDA if available.
- CUDA Installation Guide

First Run

On the first run, Silero VAD model will be automatically downloaded from torch.hub (~2MB). This is a one-time operation.

Note: This implementation uses the official OpenAI Whisper library which has full prompt support built-in!

Usage

Usage: whisper_server.py [-m <model_name>] [-d <device>] [-P <port>] [-H <host>] [--preload]


Description:
OpenedAI Whisper API Server (Silero VAD tone skipping)

Options:
-h, --help            Show this help message and exit.
-m MODEL, --model MODEL
                      The model to use for transcription.
                      Options: tiny, base, small, medium, large, large-v2, large-v3 (default: large-v3)
-d DEVICE, --device DEVICE
                      Set the torch device for the model. Ex. cuda:0 or cpu (default: auto)
-P PORT, --port PORT  Server tcp port (default: 8000)
-H HOST, --host HOST  Host to listen on, Ex. 0.0.0.0 (default: 0.0.0.0)
--preload             Preload model and exit. (default: False)

Automatic Tone and Silence Skipping

This server includes Silero VAD (Voice Activity Detection) which automatically:

Detects the start of speech in audio files
Skips alert tones, beeps, and silence at the beginning of recordings
Preserves a 150ms buffer before speech starts to maintain context
Improves transcription accuracy by removing non-speech audio

This feature is especially useful for:

Radio dispatch recordings with alert tones
Pager recordings with notification beeps
Any audio with leading silence or tones

The VAD processing is automatic and requires no configuration. It processes the audio at 16kHz mono and uses a 250ms minimum speech duration threshold with 0.5 confidence.

Sample API Usage

Health Check

Check if the server is running and ready:

curl http://localhost:8000/health

Response: {"status":"ok"}

Transcription

You can use it like this:

curl -s http://localhost:8000/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F model="whisper-1" -F file="@audio.mp3" -F response_format=text

Or just like this:

curl -s http://localhost:8000/v1/audio/transcriptions -F model="whisper-1" -F file="@audio.mp3"

Or like this example from the OpenAI Speech to text guide Quickstart:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

audio_file = open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
print(transcription.text)

Using Custom Prompts

The prompt parameter helps guide Whisper's transcription by providing context, terminology, and formatting preferences. This is especially useful for domain-specific audio like radio communications, medical terminology, or technical jargon.

Example with curl:

curl -s http://localhost:8000/v1/audio/transcriptions \
  -F model="whisper-1" \
  -F file="@audio.mp3" \
  -F prompt="Emergency radio dispatch communications. Common units: MEDIC, ENGINE, TRUCK, LADDER. Radio procedure: COPY, CLEAR, EN ROUTE, ON SCENE."

Example with Python:

from openai import OpenAI
client = OpenAI(api_key='sk-1111', base_url='http://localhost:8000/v1')

# Recommended prompt for radio dispatch transcription
prompt = """Emergency radio dispatch. CRITICAL: Never repeat. Common units: MEDIC, ENGINE, TRUCK, LADDER, SQUAD, BATTALION. Radio words: COPY, CLEAR, EN ROUTE, ON SCENE. Phonetic: ADAM, BAKER, CHARLES, DAVID, FRANK, GEORGE, KING, LINCOLN, MARY, OCEAN, QUEEN, SAM, VICTOR, X-RAY. Ages: NUMBER YEAR OLD MALE/FEMALE. Medical: GSW, SOB, CPR, AED, MVA. Use periods between statements."""

audio_file = open("/path/to/file/radio_audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file,
    prompt=prompt
)
print(transcription.text)

Recommended Radio Dispatch Prompt:

For emergency radio dispatch communications, this prompt has been tested and works well:

Emergency radio dispatch. CRITICAL: Never repeat. Common units: MEDIC, ENGINE, TRUCK, LADDER, SQUAD, BATTALION. Radio words: COPY, CLEAR, EN ROUTE, ON SCENE. Phonetic: ADAM, BAKER, CHARLES, DAVID, FRANK, GEORGE, KING, LINCOLN, MARY, OCEAN, QUEEN, SAM, VICTOR, X-RAY. Ages: NUMBER YEAR OLD MALE/FEMALE. Medical: GSW, SOB, CPR, AED, MVA. Use periods between statements.

This prompt:

Prevents repetitive hallucinations with "CRITICAL: Never repeat"
Provides common emergency service terminology
Includes phonetic alphabet for call signs
Guides proper formatting for ages and medical terms
Achieves ~95% accuracy on radio dispatch audio

Important Notes:

The prompt provides guidance, not restrictions - Whisper will still transcribe all audio
Prompts improve accuracy on domain-specific terms and reduce hallucinations
Keep prompts under 400 characters - longer prompts can trigger hallucinations
Especially helpful with poor audio quality or background noise
Customize the prompt based on your specific use case (medical, legal, technical, etc.)
If you see repeated words (hallucinations), try shortening or removing the prompt

Docker support

You can run the server via docker like so:

docker compose build
docker compose up

Options can be set via whisper.env.

Docker Notes

The Silero VAD model will be automatically downloaded on first run (~2MB)
Models are cached in the hf_home directory which is mounted as a volume
GPU support requires NVIDIA Docker runtime and compatible GPU
For CPU-only Docker, remove the runtime: nvidia and deploy sections from docker-compose.yml

Troubleshooting

Silero VAD Download Issues

If you get errors about downloading the Silero VAD model:

Ensure you have internet connectivity
Check that torch.hub has write access to cache directory
The model is downloaded from GitHub (snakers4/silero-vad)
First run may take 1-2 minutes to download the model

FFmpeg Not Found

If you get FileNotFoundError related to ffmpeg:

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Windows
choco install ffmpeg

CUDA Out of Memory

If you get CUDA out of memory errors:

Use a smaller model (small, base, or tiny)
Use CPU mode: --device cpu
Reduce concurrent requests

Import Errors

If you get import errors:

pip install -r requirements.txt --upgrade

Make sure you have Python 3.8 or higher:

python --version

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
CHANGES.md		CHANGES.md
DEPENDENCIES.md		DEPENDENCIES.md
Dockerfile		Dockerfile
INSTALL.md		INSTALL.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
UPDATE_SUMMARY.md		UPDATE_SUMMARY.md
VALIDATION_CHECKLIST.md		VALIDATION_CHECKLIST.md
docker-compose.yml		docker-compose.yml
openedai.py		openedai.py
requirements.txt		requirements.txt
whisper_server.py		whisper_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenedAI Whisper

Quick Links

API Documentation

Usage

Installation instructions

System Requirements

Installation Steps

First Run

Usage

Automatic Tone and Silence Skipping

Sample API Usage

Health Check

Transcription

Using Custom Prompts

Docker support

Docker Notes

Troubleshooting

Silero VAD Download Issues

FFmpeg Not Found

CUDA Out of Memory

Import Errors

About

Uh oh!

Releases

Packages

Languages

License

Thinline-Dynamic-Solutions/Whisper

Folders and files

Latest commit

History

Repository files navigation

OpenedAI Whisper

Quick Links

API Documentation

Usage

Installation instructions

System Requirements

Installation Steps

First Run

Usage

Automatic Tone and Silence Skipping

Sample API Usage

Health Check

Transcription

Using Custom Prompts

Docker support

Docker Notes

Troubleshooting

Silero VAD Download Issues

FFmpeg Not Found

CUDA Out of Memory

Import Errors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages