ChordiSpeak - AI Chord Vocal Generator

Version: 1.0.0

ChordiSpeak generates a new vocal track for your song that speaks the chord names in the original singer's voice, perfectly timed to the music.

Live Demo: https://chordispeak.com

Features

Vocal/Instrumental Separation: Uses Demucs AI (htdemucs model) to split uploaded audio into vocals and instrumental
Advanced Chord Detection: Uses Madmom's deep learning-based chord recognition with robust error handling
Voice Cloning: Uses the full vocal track with Coqui TTS XTTS v2 to synthesize spoken chord names in the original singer's voice
Audio Mixing: Overlays the synthesized chord vocals onto the instrumental track with precise timing
Real-time Progress Tracking: Detailed progress updates with demucs percentage tracking
Simple Web Interface: Upload a file and download the result
Robust Error Handling: Multiple fallback mechanisms for reliable processing

How It Works

Upload your audio file (MP3, WAV, FLAC, M4A) via the web interface (max 50MB)
Audio Preparation: Converts to WAV format using pydub
Vocal Separation: Demucs splits the file into vocal_track.wav and instrumental_track.wav (10-15 minutes)
Voice Sample Extraction: The full vocal track is used for voice cloning to capture complete voice characteristics
Chord Detection: Madmom analyzes the instrumental using high-resolution chroma features and deep learning (40-65% of processing time)
Synthesis: Coqui TTS XTTS v2 generates spoken chord names using voice cloning (15-18 seconds per unique chord)
Mixing: The synthesized vocals are overlaid onto the instrumental with -10dB reduction
Download your new track with chord vocals

Supported Chord Types

Major chords: A, B, C, D, E, F, G (and sharps/flats)
Minor chords: Am, Bm, Cm, Dm, Em, Fm, Gm (and sharps/flats)
Seventh chords: A7, B7, C7, D7, E7, F7, G7 (and sharps/flats)
Minor seventh chords: Am7, Bm7, Cm7, Dm7, Em7, Fm7, Gm7 (and sharps/flats)
Major seventh chords: Amaj7, Bmaj7, Cmaj7, Dmaj7, Emaj7, Fmaj7, Gmaj7 (and sharps/flats)
Diminished chords: Adim, Bdim, Cdim, Ddim, Edim, Fdim, Gdim (and sharps/flats)
Augmented chords: Aaug, Baug, Caug, Daug, Eaug, Faug, Gaug (and sharps/flats)
Suspended chords: Asus2, Asus4, Bsus2, Bsus4, etc. (and sharps/flats)

Chord Pronunciation

A: "AY"
B: "BEE"
C: "SEE"
D: "DEE"
E: "EE"
F: "EFF"
G: "GEE"

Chord types are pronounced as: "AY MINOR", "BEE SEVENTH", "SEE MAJOR SEVENTH", etc.

Requirements

Python 3.11
FFmpeg (for audio processing)
8GB+ RAM (for TTS models)
GPU recommended (for faster processing)

Installation

Clone the repository:

git clone <repository-url>
cd chordispeak

Create and activate a Python 3.11 virtual environment:
```
python3.11 -m venv venv
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirements.txt
```
Start the server:
```
./start.sh
```
Open http://localhost:5001 in your browser

Processing Time

3-4 minute songs: ~5-10 minutes total processing time
Demucs separation: 10-15 minutes (most time-consuming step)
Chord detection: 2-3 minutes with high-sensitivity settings
TTS synthesis: 15-18 seconds per unique chord
Real-time factor: 12-15x (processing time vs. audio duration)

API Endpoints

POST /upload — Upload an audio file for processing
GET /status/<task_id> — Check processing status with detailed progress
GET /download/<task_id> — Download the processed audio
GET /chords/<task_id> — Get chord progression data
GET /health — Health check
GET /docs — API documentation
POST /cancel/<task_id> — Cancel running task
GET /logs/<task_id> — Get detailed processing logs
GET /debug/<task_id> — Debug task information

Project Structure

chordispeak/
├── app.py           # Flask backend API
├── run.py           # Dev server launcher (port 5001)
├── index.html       # Web interface
├── requirements.txt # Python dependencies
├── version.py       # Version management script
├── VERSION          # Current version file
├── start.sh         # Startup script
├── knowledge.md     # Technical documentation
├── uploads/         # Uploaded and processed files
├── test_chord_detection.py # Chord detection debugging
└── README.md        # This file

Recent Improvements (v1.0.0)

Robust Chord Detection

Madmom-native audio loading: Uses madmom.io.audio.load_audio_file() for better compatibility
Two-step approach: Chroma feature extraction followed by chord recognition
Conservative filtering: 0.5 confidence threshold, 0.5s minimum duration, 1.0s between changes
Multiple fallback mechanisms: 3 different approaches for reliable chord detection
Comprehensive data validation: Checks for string data, NaN/Inf values, and numeric types
Enhanced error handling: Detailed logging and graceful degradation

Web Interface Improvements

Static file serving: Proper routes for Logo.png, favicon.ico, and favicon.png
Transparent logo: Optimized 33KB transparent logo for better performance
Updated favicon: New favicon with proper ICO and PNG formats
Better error reporting: Specific error messages for each processing step

Enhanced Progress Tracking

Real-time demucs progress: Percentage tracking during vocal separation
Detailed step updates: 8 distinct processing steps with percentages
Chord synthesis progress: Individual chord synthesis tracking
Enhanced error reporting: Specific error messages for each step

Voice Cloning Enhancements

Full vocal track usage: Complete vocal track for better voice cloning
PyTorch compatibility: Fixed for PyTorch 2.6+ with monkey patch
TTS caching: Results cached to avoid re-synthesis

Version Management

ChordiSpeak uses semantic versioning (MAJOR.MINOR.PATCH). Use the version management script:

# Check current version
python version.py current

# Bump versions
python version.py bump-patch  # 1.0.0 -> 1.0.1
python version.py bump-minor  # 1.0.0 -> 1.1.0  
python version.py bump-major  # 1.0.0 -> 2.0.0

# Set specific version
python version.py set 1.2.3

Technical Details

Processing Pipeline

Audio Preparation (5%): Convert to WAV format
Vocal Separation (10-25%): Demucs AI separation with real-time progress
Voice Sample Extraction (30%): Extract full vocal track for cloning
Chord Detection (40-65%): Madmom deep learning analysis with robust error handling
TTS Synthesis (70-85%): Coqui TTS voice cloning per unique chord
Audio Mixing (85-100%): Overlay chord vocals onto instrumental

Dependencies

Flask: Web framework with hot-reloading
Madmom: Advanced chord detection with native audio loading
Demucs: High-quality vocal separation (htdemucs model)
Coqui TTS: Voice synthesis with XTTS v2
PyTorch: Machine learning backend
Librosa: Audio processing (fallback)
Pydub: Audio format conversion
Transformers: 4.49.0 (compatible version)

Known Limitations

Processing Time: TTS synthesis is the slowest step
Memory Usage: TTS models require significant RAM (8GB+ recommended)
Task Storage: In-memory only (not persistent across restarts)
Audio Quality: Depends on Demucs separation quality and vocal clarity

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.quick		Dockerfile.quick
Dockerfile.simple		Dockerfile.simple
Icon-transparent.png		Icon-transparent.png
Icon.png		Icon.png
LICENSE		LICENSE
Logo-transparent-final.png		Logo-transparent-final.png
Logo-transparent-inverted.png		Logo-transparent-inverted.png
Logo-transparent.png		Logo-transparent.png
Logo.png		Logo.png
README.md		README.md
Screenshot_2025-07-12_20-44-56.png		Screenshot_2025-07-12_20-44-56.png
Screenshot_2025-07-12_21-03-19.png		Screenshot_2025-07-12_21-03-19.png
VERSION		VERSION
app.py		app.py
app_backup.py		app_backup.py
app_minimal.py		app_minimal.py
app_simple.py		app_simple.py
cloudbuild.yaml		cloudbuild.yaml
explore_madmom.py		explore_madmom.py
favicon-transparent-square.png		favicon-transparent-square.png
favicon-transparent.png		favicon-transparent.png
favicon.ico		favicon.ico
favicon.png		favicon.png
index.html		index.html
knowledge.md		knowledge.md
madmom_compat.py		madmom_compat.py
missing_routes.py		missing_routes.py
requirements.txt		requirements.txt
requirements_freeze.txt		requirements_freeze.txt
requirements_simple.txt		requirements_simple.txt
run.py		run.py
simple_test.py		simple_test.py
start.sh		start.sh
test_chord_detection.py		test_chord_detection.py
test_chord_fix.py		test_chord_fix.py
test_deployment.py		test_deployment.py
test_dots.py		test_dots.py
test_madmom_simple.py		test_madmom_simple.py
test_pronunciation.py		test_pronunciation.py
version.py		version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChordiSpeak - AI Chord Vocal Generator

Features

How It Works

Supported Chord Types

Chord Pronunciation

Requirements

Installation

Processing Time

API Endpoints

Project Structure

Recent Improvements (v1.0.0)

Robust Chord Detection

Web Interface Improvements

Enhanced Progress Tracking

Voice Cloning Enhancements

Version Management

Technical Details

Processing Pipeline

Dependencies

Known Limitations

License

About

Uh oh!

Releases

Packages

Languages

License

151henry151/chordispeak

Folders and files

Latest commit

History

Repository files navigation

ChordiSpeak - AI Chord Vocal Generator

Features

How It Works

Supported Chord Types

Chord Pronunciation

Requirements

Installation

Processing Time

API Endpoints

Project Structure

Recent Improvements (v1.0.0)

Robust Chord Detection

Web Interface Improvements

Enhanced Progress Tracking

Voice Cloning Enhancements

Version Management

Technical Details

Processing Pipeline

Dependencies

Known Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages