A Python desktop application for recording vocals over music tracks with automatic alignment, normalization, and mixing. Built with PySide6, sounddevice, Demucs, and pyloudnorm. Runs on Windows 10 (primary dev environment) and Linux.
Read PROJECT.md for full roadmap and phase details.
- OS: Windows 10 Pro (primary), Linux (secondary)
- Python: 3.12 via project venv (
.venv/Scripts/python.exeon Windows — always use the venv, defaultpythonpoints to 3.14) - IDE: VS Code or command line
- Audio hardware: Behringer X-Air 16 mixer, Shure SM58 mic (not needed for development — any mic works)
- GPU: RTX 3060 6GB — available for Demucs inference but not required
VocalForge/
├── vocalforge/
│ ├── __init__.py
│ ├── __main__.py # Entry point: python -m vocalforge
│ ├── app.py # QApplication setup, main window
│ ├── ui/
│ │ ├── __init__.py # Shared widgets (JumpSlider)
│ │ ├── main_window.py # Main window layout, panel orchestration
│ │ ├── import_panel.py # Song loading, separation trigger
│ │ ├── record_panel.py # Recording controls, device selection
│ │ ├── mix_panel.py # Mixing controls, effects, presets, export
│ │ └── waveform.py # Waveform display widget (shared)
│ ├── audio/
│ │ ├── __init__.py
│ │ ├── engine.py # Playback + recording streams (sounddevice)
│ │ ├── alignment.py # Cross-correlation alignment (constrained)
│ │ ├── effects.py # 14-stage vocal processing pipeline
│ │ ├── mixer.py # LUFS normalization + mixing
│ │ └── noise_reduction.py # Spectral gating + high-pass filter
│ ├── separation/
│ │ ├── __init__.py
│ │ └── demucs_worker.py # Demucs separation (runs in QThread)
│ └── utils/
│ ├── __init__.py
│ └── audio_io.py # Load/save audio files (soundfile wrapper)
├── tests/
│ ├── test_alignment.py
│ ├── test_audio_io.py
│ ├── test_effects.py
│ ├── test_engine.py
│ ├── test_mixer.py
│ ├── test_noise_reduction.py
│ └── test_waveform.py
├── requirements.txt
├── README.md
├── CLAUDE.md
├── LICENSE
└── .gitignore
Audio is timing-critical. Follow these rules strictly:
- Main thread — PySide6 GUI only. Never do audio I/O or processing here.
- Audio callback thread — sounddevice's PortAudio callback. Must be lock-free, no allocations, no Python GIL-heavy work. Only copy samples to/from pre-allocated numpy buffers.
- Processing thread(s) — QThread for Demucs separation, alignment, mixing. Communicate with GUI via Qt signals/slots.
Never call sounddevice functions from inside the audio callback. The callback only reads/writes to shared numpy ring buffers.
ui/imports fromaudio/andseparation/— never the reverseaudio/andseparation/must not import PySide6 (they can be tested headlessly)separation/must not importaudio/— they are independentutils/is leaf — imports only standard library and soundfile
- Internal format: numpy float32 arrays, shape
(samples, channels)for stereo,(samples,)for mono - Sample rate: 44100 Hz as default, but respect source file sample rate — resample only when mixing tracks with different rates
- File I/O: use
soundfile.read()/soundfile.write()— always specifydtype='float32'
- Use PySide6, not PyQt6 (LGPL vs GPL licensing)
- Import style:
from PySide6.QtWidgets import ...(neverfrom PySide6 import *) - Connect signals with new-style syntax:
button.clicked.connect(self._on_click) - Long operations → QThread + signals, never
QApplication.processEvents()hacks - No QML — pure widgets (QMainWindow, QWidget, QVBoxLayout, etc.)
See docs/ARCHITECTURE.md for stable design decisions (alignment, LUFS, Demucs, state machine).
Development follows PROJECT.md phases. Each phase is self-contained:
- Phases 1–5: Core workflow — skeleton, audio loading, playback, recording, alignment + mixing [DONE]
- Phase 6 (a–d): Demucs separation, noise reduction, chain alignment, constrained alignment, HPF [DONE]
- Phase 7 (a–d): Interactive alignment, multi-track preview, offset sliders, mono waveforms [DONE]
- Phase 8a: Noise gate + de-reverb + preset system + UX improvements [DONE]
- Phase 8b: Parametric EQ + compressor + NR mode selection (v0.3.1) [DONE]
- Phase 8c: De-esser + reverb — completes 9-stage pipeline [DONE]
- Phase 8d: Gain rider, de-plosive, serial compression, soft clipper — 13-stage pipeline [v0.5.0] [DONE]
- Phase 8e: Two-pass NR, chain reorder — 14-stage pipeline [v0.5.1]
- Phase 11: Settings persistence, drag-and-drop, error handling [v0.6.0]
- Phase 12: Testing, PyInstaller exe, README screenshots [v0.7.0]
- Phase 10: Auto-tune research & prototyping [v0.8.0]
Do not pull in components from later phases.
- Unit tests: alignment math (known-offset synthetic signals), mixer output levels, audio I/O round-trip
- Integration tests: record silence → align → mix → verify output duration matches input
- Manual validation: record actual vocals, listen to output, verify alignment sounds correct
- No mocking of sounddevice in unit tests — audio/ module tests use synthetic numpy arrays only
# Always use the project venv
# Windows: .venv\Scripts\activate (or invoke .venv/Scripts/python.exe directly)
# Linux: source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run application
python -m vocalforge
# Run tests
pytest tests/ -v- Python modules:
snake_case.py - Classes:
PascalCase - Functions/methods:
snake_case - Constants:
UPPER_SNAKE_CASE - Private methods:
_leading_underscore - Test files:
test_<module>.py