This demo shows how to improve speech recognition accuracy by dynamically boosting domain-specific keyterms during real-time transcription. It uses AssemblyAI's Streaming API with LLM Gateway-generated keyterms extracted from customer conversation history.
The script demonstrates a two-phase keyterm boosting approach:
-
Generic Fallback Keyterms: The session starts immediately with generic healthcare/housing terms (e.g., "appointment", "prescription", "lease") for low-latency startup.
-
Contextual LLM-Generated Keyterms: In the background, an LLM analyzes the customer's conversation history and generates personalized keyterms (names, locations, medications, etc.) that are then sent to the streaming session.
The LLM extracts proper nouns from conversation history that ASR would typically struggle with:
- Person names with less common spellings (e.g., "Oluwatoyin Adéwálé", "Byrne-Donahue")
- Place names that are phonetically ambiguous (e.g., "Schuylkill", "Ouachita", "Wilkes-Barre")
- Medication names (e.g., "Atorvastatin", "Farxiga")
- Organization names specific to the customer's context
The previous_conversations.json file simulates a customer database. In production, this would be replaced with a call to your actual database where customer context is stored (CRM, call logs, appointment history, etc.). The key insight is that you likely already have valuable context about each caller that can dramatically improve transcription accuracy.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install "assemblyai[extras]" python-dotenv requestsCreate a .env file in the project root:
cp .env.example .envThen edit .env and add your AssemblyAI API key (found on the API Keys page):
ASSEMBLYAI_API_KEY=your_api_key_here
Note: LLM Gateway is a paid feature, so you'll need to add a card to your account (your free credits will still apply).
This is the best way to evaluate the impact of keyterm boosting. Provide an audio file containing words from the previous_conversations.json database (names, places, medications). The script runs the same audio twice:
- First without keyterm boosting (baseline)
- Then with LLM-generated keyterm boosting
At the end, you see a side-by-side comparison showing how boosting improves accuracy for difficult terms.
cd demo
python main.py ../files/test_file.wavDemo output (with brackets for readability):
============================================================
GROUND TRUTH:
============================================================
Hi, this is [Kelly Byrne-Donoghue] and I'm calling just to confirm
my appointment with Dr. [Oluwatoyin Adéwálé] at the [Schuylkill]
Family Health Center. I also need to reschedule my physical
therapy with Dr. [Xiomara] at [Ouachita] Rehabilitation Center.
My sister [Leigh Rhys-Davies] is picking up my [Atorvastatin] and
[Farxiga] prescriptions from the [Wilkes-Barre] [CVS] pharmacy.
============================================================
SESSION 1 (NO BOOSTING):
============================================================
Turn 1: Hi, this is [Kelly Byrne Donahue], and I'm calling just to
confirm my appointment with Dr. [Oluatoyan Adewale] at the
[Schuylkill] Family Health Center.
Turn 2: I also need to reschedule my physical therapy with
Dr. [Ziomara] at the [Wichita] Rehabilitation Center.
Turn 3: My sister, [Lee Re Davies] is picking up my [autor bastatin]
and [farzika] prescriptions from the [Wilkes Bear] [CBS] pharmacy.
============================================================
SESSION 2 (WITH BOOSTING):
============================================================
Turn 1: Hi, this is [Kelly Byrne-Donahue], and I'm calling just to
confirm my appointment with Dr. [Oluwatoyin Adéwálé] at the
[Schuylkill] Family Health Center.
Turn 2: I also need to reschedule my physical therapy with
Dr. [Xiomara] at the [Ouachita] Rehabilitation Center.
Turn 3: My sister, [Leigh Rhys-Davies] is picking up my [Atorvastatin]
and [Farxiga] prescriptions from the [Wilkes-Barre] [CVS] pharmacy.
Audio file requirements:
- Format: WAV (16-bit PCM)
- Sample rate: 16kHz (or specify with
--sample-rateand changeSAMPLE_RATEinconfig.py) - Channels: Mono
Stream directly from your microphone with keyterm boosting enabled:
cd demo
python main.pySpeak naturally and watch how the transcription handles difficult names and terms. After 50 words, keyterms will dynamically refresh based on conversation content. You can customize endpointing behavior and other streaming parameters in config.py (see API reference and turn detection configurations).
This demo focuses solely on Speech-to-Text with keyterm boosting. It does not include:
- Voice agent / conversational AI responses
- Text-to-Speech output
⭐ To extend this to a full voice agent, see EXTENDING_TO_VOICE_AGENT.md for detailed instructions on using LLM Gateway to maintain agent response history alongside transcription context.
- Session Start: Generic keyterms are loaded immediately for low-latency startup
- Background LLM Call: The customer's conversation history is sent to Claude via LLM Gateway
- Keyterm Extraction: The LLM returns up to 100 domain-specific keyterms
- Dynamic Update: Keyterms are pushed to the active streaming session via
set_params() - Ongoing Refresh: Every 50 words, keyterms can be refreshed based on new conversation content
demo/
├── main.py # Main entry point - streaming logic and event handlers
├── config.py # Configuration constants (edit this to customize)
├── keyterms.py # LLM Gateway integration and keyterm generation
├── previous_conversations.json # Sample conversation history database
└── EXTENDING_TO_VOICE_AGENT.md # Guide for adding voice agent capabilities
- ⭐ Change configuration: Edit
config.pyto adjust sample rate, speech model, LLM model, and streaming parameters - Change the conversation history: Edit
previous_conversations.jsonor replace theload_previous_conversations()function inkeyterms.pywith your own database call - Modify keyterm extraction: Adjust the LLM prompt in
keyterms.pyfor your domain - Adjust refresh frequency: Change
KEYTERM_REFRESH_THRESHOLDinconfig.py(default: 50 words)