GitHub - revanthvijaychandra-creator/whisperize-master: Whisperize: 📝 Repository Description 🎙️ A real-time audio transcription ✍️ and speaker diarization tool 👥 powered by Faster-Whisper ⚡ and PyAnnote 🤖. Supports 🎤 microphone and 📂 WAV file input with 🚀 high-performance processing for 🍏 Apple Silicon (MPS) and 🔌 CUDA.

Whisperize: Real-Time Diarization & Transcription

Whisperize is a high-performance Python application designed for🎙️ A real-time audio transcription ✍️ and speaker diarization tool 👥 powered by Faster-Whisper ⚡ and PyAnnote 🤖. Supports 🎤 microphone and 📂 WAV file input with 🚀 high-performance processing for 🍏 Apple Silicon (MPS) and 🔌 CUDA. By leveraging Faster-Whisper and PyAnnote, it identifies "who spoke what" with high accuracy and low latency.

🚀 Key Features

Real-Time Processing: Simultaneous transcription and speaker identification using thread-safe parallel processing.

Hardware Optimized: Built-in support for Apple Silicon (MPS) and CUDA acceleration, with a "force CPU" fallback for compatibility.

Dual-Input Modes: Process live audio directly from your microphone or analyze existing WAV files.

Advanced Diarization: Uses PyAnnote 3.1 to distinguish between multiple speakers (up to 5).

Flexible Exports: Generate human-readable .txt transcripts or structured .json files containing word-level timestamps and confidence scores.

📋 Requirements

Python: 3.10 or higher.

System Tool: FFmpeg is required for audio stream handling.

HuggingFace Access: An account and access token are required to download the PyAnnote diarization models.

⚙️ Installation

Clone the Repository

git clone https://github.com/revanthvijaychandra-creator/whisperize.git
cd whisperize

2. **Set Up Environment**
```bash
python -m venv .venv
source .venv/bin/activate  # Unix/macOS
# .venv\Scripts\activate   # Windows


3. 
**Install Dependencies** 


```bash
pip install -r requirements.txt

🛠 Configuration

Before running the application, update the config.json file with your credentials and preferences:

Key	Description	Default
`huggingface_token`	Required: Your HF access token

| None | | model | Whisper size (tiny to turbo)

| base | | language | Language code (e.g., it, en, es)

| auto | | output_format | Choose between text or json

| text | | whisper_force_cpu | If true, bypasses GPU/MPS acceleration

| false |

📖 Usage

Microphone Mode (Live)

Simply run the script to start listening to your default input device:

python whisperize.py

File Mode

Process a pre-recorded WAV file by providing the path as an argument:

python whisperize.py path/to/audio.wav

Note: Only 16-bit WAV files are currently supported for direct file processing.

📄 Output Example

Text Output:

[cite_start][00:00:02.500-00:00:05.300] [SPEAKER_00]: Hello, this is a test transcription. [cite: 2]
[cite_start][00:00:06.100-00:00:09.800] [SPEAKER_01]: Yes, I can hear you clearly. [cite: 2]

🤝 Acknowledgments

Faster-Whisper for the optimized transcription engine.

PyAnnote for the state-of-the-art speaker diarization models.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
config.json		config.json
requirements.txt		requirements.txt
whisperize.py		whisperize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisperize: Real-Time Diarization & Transcription

🚀 Key Features

📋 Requirements

⚙️ Installation

🛠 Configuration

📖 Usage

Microphone Mode (Live)

File Mode

📄 Output Example

🤝 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisperize: Real-Time Diarization & Transcription

🚀 Key Features

📋 Requirements

⚙️ Installation

🛠 Configuration

📖 Usage

Microphone Mode (Live)

File Mode

📄 Output Example

🤝 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages