A lab for evaluating automatic speech recognition (ASR) and large language models (LLMs) on real-world transcription tasks.
- 🎧 Transcribes speech audio using OpenAI's Whisper ASR system
- 📊 Calculates WER metrics: Word Error Rate, Match Error Rate, insertions, deletions, substitutions, and more
- 🤖 Uses local LLMs (Mistral, Phi, TinyLlama via Ollama) to generate qualitative feedback
- 🔁 Benchmarks multiple LLMs on the same hypothesis/reference pair for consistent evaluation
- 📂 Handles multiple stress levels (low/med/high) with real, royalty-free sample audio
- 📋 Outputs data-driven + human-readable reports summarizing transcription accuracy
Watch the transcription + WER feedback flow in action:
| Script | Purpose |
|---|---|
run_demo.sh |
Main entry point: runs transcription, WER, LLM feedback, and benchmarks |
transcribe.py |
Transcribes audio to text using Whisper |
analyze_transcript.py |
Computes WER and calls LLM for feedback |
benchmark_llms.py |
Compares multiple LLMs on a reference/hypothesis pair |
wer_metrics.py |
Calculates WER and related metrics |
llm_feedback.py |
Generates LLM-based qualitative feedback |
| Tool/Library | Purpose | Notes |
|---|---|---|
| Python 3.8+ | Required runtime | Check with python3 --version |
whisper |
Transcription (speech → text) | Install via GitHub |
ollama |
Local LLM runner | Run models like Mistral, Phi, TinyLlama |
jiwer |
WER calculation | Used for metrics breakdown |
numpy<2.0 |
Dependency for PyTorch/Whisper | Required for compatibility |
ffmpeg |
Audio processing backend | Install via Homebrew: 1. /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"2. brew install ffmpeg |
# 🧱 Clone the repo
git clone https://github.com/YOUR_USERNAME/transcriptr-lab.git
cd transcriptr-lab
# 📦 (Optional but recommended) Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# 📦 Install Python dependencies
pip install -r requirements.txt
# 🎵 Install ffmpeg (required for Whisper audio processing)
# If you don't have Homebrew, install it first:
# (Skip this if Homebrew is already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Then install ffmpeg:
brew install ffmpeg
# 🧠 Install Whisper
pip install git+https://github.com/openai/whisper.git
# 🤖 Install Ollama
# On macOS:
# Visit https://ollama.com/download and install the app manually
# Then open it from Applications to start the background server
# 🔽 Pull LLM models (once Ollama is running)
ollama pull mistral
ollama pull phi
ollama pull tinyllama| Dataset | Command |
|---|---|
| Low Stress | bash run_demo.sh sample_data/low_stress |
| Medium Stress | bash run_demo.sh sample_data/med_stress |
| High Stress | bash run_demo.sh sample_data/high_stress |
| All Datasets | bash run_demo.sh sample_data |
- If you run
bash run_demo.shwith no arguments, it defaults tolow_stress. - Each run will also automatically call
benchmark_llms.pyfor the selected dataset(s).
sample_data/
low_stress/
reference.txt
hypothesis.txt
audio.wav
med_stress/
reference.txt
hypothesis.txt
audio.wav
high_stress/
reference.txt
hypothesis.txt
audio.wav