Skip to content

rodneyaquino/transcriptr-lab

Repository files navigation

🧪 Transcriptr Lab – WER + LLM Demo

A lab for evaluating automatic speech recognition (ASR) and large language models (LLMs) on real-world transcription tasks.

🧠 What This Lab Does

  • 🎧 Transcribes speech audio using OpenAI's Whisper ASR system
  • 📊 Calculates WER metrics: Word Error Rate, Match Error Rate, insertions, deletions, substitutions, and more
  • 🤖 Uses local LLMs (Mistral, Phi, TinyLlama via Ollama) to generate qualitative feedback
  • 🔁 Benchmarks multiple LLMs on the same hypothesis/reference pair for consistent evaluation
  • 📂 Handles multiple stress levels (low/med/high) with real, royalty-free sample audio
  • 📋 Outputs data-driven + human-readable reports summarizing transcription accuracy

🎥 Demo Video

Watch the transcription + WER feedback flow in action:

▶️ View Demo Video


🗂️ Main Scripts

Script Purpose
run_demo.sh Main entry point: runs transcription, WER, LLM feedback, and benchmarks
transcribe.py Transcribes audio to text using Whisper
analyze_transcript.py Computes WER and calls LLM for feedback
benchmark_llms.py Compares multiple LLMs on a reference/hypothesis pair
wer_metrics.py Calculates WER and related metrics
llm_feedback.py Generates LLM-based qualitative feedback

🛠️ Requirements

Tool/Library Purpose Notes
Python 3.8+ Required runtime Check with python3 --version
whisper Transcription (speech → text) Install via GitHub
ollama Local LLM runner Run models like Mistral, Phi, TinyLlama
jiwer WER calculation Used for metrics breakdown
numpy<2.0 Dependency for PyTorch/Whisper Required for compatibility
ffmpeg Audio processing backend Install via Homebrew:
1. /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
2. brew install ffmpeg

⚙️ Installation

# 🧱 Clone the repo
git clone https://github.com/YOUR_USERNAME/transcriptr-lab.git
cd transcriptr-lab

# 📦 (Optional but recommended) Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

# 📦 Install Python dependencies
pip install -r requirements.txt

# 🎵 Install ffmpeg (required for Whisper audio processing)
# If you don't have Homebrew, install it first:
# (Skip this if Homebrew is already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Then install ffmpeg:
brew install ffmpeg

# 🧠 Install Whisper
pip install git+https://github.com/openai/whisper.git

# 🤖 Install Ollama

# On macOS:
# Visit https://ollama.com/download and install the app manually
# Then open it from Applications to start the background server

# 🔽 Pull LLM models (once Ollama is running)
ollama pull mistral
ollama pull phi
ollama pull tinyllama

🚀 How to Run

Dataset Command
Low Stress bash run_demo.sh sample_data/low_stress
Medium Stress bash run_demo.sh sample_data/med_stress
High Stress bash run_demo.sh sample_data/high_stress
All Datasets bash run_demo.sh sample_data
  • If you run bash run_demo.sh with no arguments, it defaults to low_stress.
  • Each run will also automatically call benchmark_llms.py for the selected dataset(s).

📁 Sample Data Structure

sample_data/
  low_stress/
    reference.txt
    hypothesis.txt
    audio.wav
  med_stress/
    reference.txt
    hypothesis.txt
    audio.wav
  high_stress/
    reference.txt
    hypothesis.txt
    audio.wav

About

WER + speech-to-text benchmarking toolkit with LLM feedback (Whisper + Mistral)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published