-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Description
System Configuration
Hardware
- Platform: Raspberry Pi 5 Model B Rev 1.1
- Architecture: aarch64 (ARM64)
- Hailo Device: Hailo-8L
- Device ID: 0001:01:00.0
Software Versions
- OS: Debian GNU/Linux 12 (bookworm)
- HailoRT: 4.20.0-1
- Firmware Version: 4.20.0 (release, app, extended context switch buffer)
- Python: 3.11
- Repository Commit: 1dad6e4 (2025-12-10) - "add h8l to paddleocr, obb readme fix (add h8l to paddleocr, obb readme fix #368)"
Installed Packages
hailort: 4.20.0-1
hailo-all: 4.20.0
hailofw: 4.20.0-1
python3-hailort: 4.20.0-1
Python Dependencies (in virtual environment)
transformers: 4.50.1
sounddevice: 0.5.1
torch: 2.6.0
scipy: 1.9.3
numpy: 1.24.2
Problem Description
The Whisper base model produces empty or inconsistent transcriptions on Hailo-8L hardware, while the system appears to be correctly configured and operational.
Testing Performed
1. Installation & Setup
- ✅ Ran
python3 setup.pysuccessfully - ✅ Downloaded all required HEF files for Hailo-8L using
download_resources.py - ✅ Re-downloaded fresh HEF files to rule out corruption
- ✅ All dependencies installed correctly in virtual environment
2. Hardware Verification
- ✅ Device detected:
hailortcli scanshows device 0001:01:00.0 - ✅ Firmware identified correctly
- ✅ HailoRT service running properly
- ✅ Restarted HailoRT service - no improvement
3. Audio Recording Tests
- ✅ Audio recording functional (various quality microphones tested)
- ✅ Audio levels verified (ranging from 0.28 to 0.99 max level)
- ✅ Audio preprocessing working (VAD detecting speech correctly)
- ✅ Mel spectrogram generation successful
4. Model Testing
Base Model (5-second encoder)
Command: python3 -m app.app_hailo_whisper --hw-arch hailo8l --variant base --duration 5
Results:
- Inconsistent performance - occasionally produces partial transcriptions, mostly empty
- Example successful transcription (1 out of 10+ attempts):
"testing 123" - Example partial transcription:
"is a 5 2"(from "This is a 5 second recording") - Majority of attempts: Empty string
''returned from decoder
Sample Output:
Audio loaded: 78674 samples, max level: 0.9793
After preprocessing: start_time=1.2, audio length: 78674 samples
Chunk offset: 1.00s
Raw transcription: ' is a 5 2'
Cleaned transcription: 'is a 5 2.'
Then subsequent recordings:
Audio loaded: 78535 samples, max level: 0.3499
After preprocessing: start_time=1.0, audio length: 78535 samples
Chunk offset: 0.50s
Raw transcription: ''
Cleaned transcription: '.'
Tiny Model (10-second encoder)
Command: python3 -m app.app_hailo_whisper --hw-arch hailo8l --variant tiny --duration 10
Results:
- Garbled output with unicode characters and random tokens
- Example:
'%。...............,......' - Example:
'..... other alert hurt�... other........�..' - Example:
'%,,, ", to [,,,, " w,," [ st, -- "告诉 ',,, a, w'
5. Configuration Variations Tested
- ✅ VAD enabled (default)
- ✅ VAD disabled (
--no-vadflag) - ✅ Different chunk offsets (0.2s, 0.5s buffer before detected speech)
- ✅ Different audio durations (5s, 10s)
- ✅ Reuse audio mode (
--reuse-audio) - ✅ Fresh recordings with various microphone qualities
Observed Behavior
What Works
- ✅ Hardware detection and initialization
- ✅ Audio recording and loading
- ✅ Voice Activity Detection (VAD)
- ✅ Audio preprocessing and gain adjustment
- ✅ Mel spectrogram generation
- ✅ Encoder processing (no errors)
- ✅ Pipeline completes without crashes
What Fails
- ❌ Decoder output is empty or corrupted in >90% of attempts
- ❌ Base model produces mostly empty strings
- ❌ Tiny model produces garbled unicode and random tokens
- ❌ Extremely inconsistent - same audio produces different results
Evidence
Successful Transcription (happened once)
Raw transcription: ' testing 123'
Cleaned transcription: 'testing 123.'
Typical Failed Output (common)
Raw transcription: ''
Cleaned transcription: '.'
Garbled Output (tiny model)
Raw transcription: '..... other alert hurt�... other........�..'
Raw transcription: '%,,, ", to [,,,, " w,," [ st, -- "告诉 ',,, a, w'
Reproduction Steps
- Setup Raspberry Pi 5 with Hailo-8L (AI Kit)
- Install HailoRT 4.20.0
- Clone latest Hailo-Application-Code-Examples repository
- Run
python3 setup.pyin speech_recognition directory - Run:
python3 -m app.app_hailo_whisper --hw-arch hailo8l --variant base --duration 5 - Record clear English speech
- Observe empty or garbled transcription output
Any ideas?
Should I try HailoRt 4.21 ?
Tested with 2 mics and different volummes.
Generated .wav file is good, voice clear.
Thanks in advance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels