Skip to content

revanthvijaychandra-creator/whisperize-master

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Whisperize: Real-Time Diarization & Transcription

Whisperize is a high-performance Python application designed forπŸŽ™οΈ A real-time audio transcription ✍️ and speaker diarization tool πŸ‘₯ powered by Faster-Whisper ⚑ and PyAnnote πŸ€–. Supports 🎀 microphone and πŸ“‚ WAV file input with πŸš€ high-performance processing for 🍏 Apple Silicon (MPS) and πŸ”Œ CUDA. By leveraging Faster-Whisper and PyAnnote, it identifies "who spoke what" with high accuracy and low latency.


πŸš€ Key Features

Real-Time Processing: Simultaneous transcription and speaker identification using thread-safe parallel processing.

Hardware Optimized: Built-in support for Apple Silicon (MPS) and CUDA acceleration, with a "force CPU" fallback for compatibility.

Dual-Input Modes: Process live audio directly from your microphone or analyze existing WAV files.

Advanced Diarization: Uses PyAnnote 3.1 to distinguish between multiple speakers (up to 5).

Flexible Exports: Generate human-readable .txt transcripts or structured .json files containing word-level timestamps and confidence scores.

πŸ“‹ Requirements

Python: 3.10 or higher.

System Tool: FFmpeg is required for audio stream handling.

HuggingFace Access: An account and access token are required to download the PyAnnote diarization models.


βš™οΈ Installation

  1. Clone the Repository
git clone https://github.com/revanthvijaychandra-creator/whisperize.git
cd whisperize

2. **Set Up Environment**
```bash
python -m venv .venv
source .venv/bin/activate  # Unix/macOS
# .venv\Scripts\activate   # Windows


3. 
**Install Dependencies** 


```bash
pip install -r requirements.txt

πŸ›  Configuration

Before running the application, update the config.json file with your credentials and preferences:

Key Description Default
huggingface_token
Required: Your HF access token

| None | | model | Whisper size (tiny to turbo)

| base | | language | Language code (e.g., it, en, es)

| auto | | output_format | Choose between text or json

| text | | whisper_force_cpu | If true, bypasses GPU/MPS acceleration

| false |


πŸ“– Usage

Microphone Mode (Live)

Simply run the script to start listening to your default input device:

python whisperize.py

File Mode

Process a pre-recorded WAV file by providing the path as an argument:

python whisperize.py path/to/audio.wav

Note: Only 16-bit WAV files are currently supported for direct file processing.


πŸ“„ Output Example

Text Output:

[cite_start][00:00:02.500-00:00:05.300] [SPEAKER_00]: Hello, this is a test transcription. [cite: 2]
[cite_start][00:00:06.100-00:00:09.800] [SPEAKER_01]: Yes, I can hear you clearly. [cite: 2]


🀝 Acknowledgments

Faster-Whisper for the optimized transcription engine.

PyAnnote for the state-of-the-art speaker diarization models.

About

Whisperize: πŸ“ Repository Description πŸŽ™οΈ A real-time audio transcription ✍️ and speaker diarization tool πŸ‘₯ powered by Faster-Whisper ⚑ and PyAnnote πŸ€–. Supports 🎀 microphone and πŸ“‚ WAV file input with πŸš€ high-performance processing for 🍏 Apple Silicon (MPS) and πŸ”Œ CUDA.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages