Whisperize is a high-performance Python application designed forποΈ A real-time audio transcription βοΈ and speaker diarization tool π₯ powered by Faster-Whisper β‘ and PyAnnote π€. Supports π€ microphone and π WAV file input with π high-performance processing for π Apple Silicon (MPS) and π CUDA. By leveraging Faster-Whisper and PyAnnote, it identifies "who spoke what" with high accuracy and low latency.
Real-Time Processing: Simultaneous transcription and speaker identification using thread-safe parallel processing.
Hardware Optimized: Built-in support for Apple Silicon (MPS) and CUDA acceleration, with a "force CPU" fallback for compatibility.
Dual-Input Modes: Process live audio directly from your microphone or analyze existing WAV files.
Advanced Diarization: Uses PyAnnote 3.1 to distinguish between multiple speakers (up to 5).
Flexible Exports: Generate human-readable .txt transcripts or structured .json files containing word-level timestamps and confidence scores.
Python: 3.10 or higher.
System Tool: FFmpeg is required for audio stream handling.
HuggingFace Access: An account and access token are required to download the PyAnnote diarization models.
- Clone the Repository
git clone https://github.com/revanthvijaychandra-creator/whisperize.git
cd whisperize
2. **Set Up Environment**
```bash
python -m venv .venv
source .venv/bin/activate # Unix/macOS
# .venv\Scripts\activate # Windows
3.
**Install Dependencies**
```bash
pip install -r requirements.txt
Before running the application, update the config.json file with your credentials and preferences:
| Key | Description | Default |
|---|---|---|
huggingface_token |
Required: Your HF access token |
| None |
| model | Whisper size (tiny to turbo)
| base |
| language | Language code (e.g., it, en, es)
| auto |
| output_format | Choose between text or json
| text |
| whisper_force_cpu | If true, bypasses GPU/MPS acceleration
| false |
Simply run the script to start listening to your default input device:
python whisperize.py
Process a pre-recorded WAV file by providing the path as an argument:
python whisperize.py path/to/audio.wav
Note: Only 16-bit WAV files are currently supported for direct file processing.
Text Output:
[cite_start][00:00:02.500-00:00:05.300] [SPEAKER_00]: Hello, this is a test transcription. [cite: 2]
[cite_start][00:00:06.100-00:00:09.800] [SPEAKER_01]: Yes, I can hear you clearly. [cite: 2]
Faster-Whisper for the optimized transcription engine.
PyAnnote for the state-of-the-art speaker diarization models.