A high-performance, real-time desktop transcription application powered by faster-whisper. Sonix is designed for speed, leveraging GPU acceleration (CUDA) to provide near-instantaneous speech-to-text conversion directly on your machine.
This is a short GIF demonstrating the real-time transcription in action.
- Real-Time Transcription: See your speech converted to text almost instantly.
- GPU Accelerated: Utilizes NVIDIA CUDA via PyTorch and
faster-whisperfor maximum performance. - High Performance: Tuned for speed with
int8quantization and optimized streaming logic. - Voice Activity Detection (VAD): Intelligently filters out silence to improve accuracy and responsiveness.
- Easily Configurable: Tweak model size, performance settings, and streaming behavior in a single
config.pyfile. - Modular Codebase: A clean, multi-file structure makes the project easy to understand, maintain, and extend.
- Simple GUI: A clean and straightforward user interface built with Python's native Tkinter library.
- Core Engine: faster-whisper
- Python: 3.9+
- GPU Backend: PyTorch (CUDA enabled)
- Audio Input: sounddevice
- GUI: Tkinter (standard Python library)
- Numerical Processing: NumPy
Follow these steps carefully to get Sonix running on your local machine.
- An NVIDIA GPU: This application is built for CUDA acceleration.
- NVIDIA CUDA Toolkit: You must have the CUDA Toolkit installed. The application was tested with version 12.x. You can download it from the NVIDIA Developer website.
Open your terminal and clone this repository:
git clone https://github.com/Eng-M-Abdrabbou/Sonix.git
cd Sonix- Set Up a Python Virtual Environment
It is highly recommended to use a virtual environment to manage dependencies.
Generated bash
python -m venv venv
venv\Scripts\activate
source venv/bin/activate
First, create a file named requirements.txt in the project root and paste the following content into it:
numpy
sounddevice
faster-whisperNow, install PyTorch with CUDA support. This is the most critical step. Go to the Official PyTorch Website and select the correct command for your system (e.g., Stable, Windows, Pip, Python, CUDA 12.1).
The command will look something like this:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121First, create a file named requirements.txt in the project root and paste the following content into it:
numpy
sounddevice
faster-whisperNow, install PyTorch with CUDA support. This is the most critical step. Go to the Official PyTorch Website and select the correct command for your system (e.g., Stable, Windows, Pip, Python, CUDA 12.1).
The command will look something like this:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121After installing PyTorch, install the remaining dependencies from the requirements.txt file:
pip install -r requirements.txtOnce all dependencies are installed, you can run the application with a single command:
python main.pyThe application window will appear.
▶ START: Click to begin capturing audio and transcribing.
■ STOP: Click to halt the transcription process.
CLEAR TEXT: Click to wipe all text from the display area.
One of the main advantages of Sonix is its simple configuration. All major settings are located in config.py.
# config.pyOnce all dependencies are installed, you can run the application with a single command:
python main.pyThe application window will appear.
▶ START: Click to begin capturing audio and transcribing.
■ STOP: Click to halt the transcription process.
CLEAR TEXT: Click to wipe all text from the display area.
Configuration
One of the main advantages of Sonix is its simple configuration. All major settings are located in config.py.
Generated python
MODEL_SIZE = "small.en" # tiny.en, base.en, small.en, medium.en COMPUTE_TYPE = "int8_float16" # float16, int8_float16, int8 (fastest) INITIAL_DEVICE = "cuda"
CHUNK_DURATION_S = 2.0
LANGUAGE = "en" BEAM_SIZE = 1 # 1 is greedy decoding (fastest). >1 is slower but may be more accurate.
VAD_PARAMETERS = dict(...)
The codebase is organized into logical modules for better readability and maintainability.
Sonix-Transcriber/
├── config.py # All user-configurable settings.
├── transcription_engine.py # Core logic for audio processing and transcription.
├── gui.py # All Tkinter UI components and layout.
├── utils.py # Helper functions (CUDA path fix, device init).
├── main.py # The main entry point to launch the application.
├── README.md # You are here!
└── requirements.txt # Project dependencies.
This project is licensed under the MIT License. See the LICENSE file for details.

