This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
RVC v2 voice conversion & AI cover system. Users upload a song, the pipeline separates vocals from accompaniment (Mel-Band Roformer), converts the vocal timbre via RVC v2 (HuBERT + RMVPE + FAISS), then mixes the result back with the accompaniment.
Platform Support: Windows / Linux / WSL2 / Google Colab
Key Features:
- AI song covers with automatic vocal separation and mixing
- 117 downloadable character models
- 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
- Karaoke mode (lead/backing vocal separation)
- 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
- Dual VC pipeline (current implementation vs official RVC)
- Multi-backend GPU support (CUDA, ROCm, XPU, DirectML, MPS)
# Activate venv (Windows)
.\venv310\Scripts\Activate.ps1
# Activate venv (Linux/WSL2)
source venv310/bin/activate
# Install dependencies
python install.py # full install + launch
python install.py --check # check only
python install.py --cpu # CPU variant
# Run
python run.py # default: http://127.0.0.1:7860
python run.py --skip-check # skip env/model validation
python run.py --host 0.0.0.0 --port 8080 --share
# Download base models (HuBERT, RMVPE)
python tools/download_models.py
# Download character models
python -c "from tools.character_models import download_character_model; download_character_model('rin')"
# Quick CUDA check
python -c "import torch; print(torch.cuda.is_available())"
# Colab
# Open AI_RVC_Colab.ipynb in Google Colab, set runtime to GPU (T4), run cells sequentiallyEntry: run.py → env check → model check → ui/app.py:launch()
Pipeline flow (infer/cover_pipeline.py:CoverPipeline.process):
- Vocal separation (
infer/separator.py) — Roformer (default), Demucs, or UVR5 - RVC voice conversion (
infer/pipeline.py) — HuBERT features → RMVPE F0 → RVC v2 inference with FAISS retrieval - Mixing (
lib/mixer.py) — volume adjust + reverb via pedalboard
Character model system (tools/character_models.py):
- 117 downloadable character models from HuggingFace (
trioskosmos/rvc_models) - Stored in
assets/weights/characters/ - Version notes (epochs, sample rate) extracted from .pth metadata and cached in
_version_notes.json - Display name assembly:
_get_display_name()appends(500 epochs·40k)style training info
UI (ui/app.py):
- Gradio 3.50.2, single-file ~2000 lines
- i18n via
i18n/zh_CN.json, accessed throught(key, section)helper - Three main tabs: song cover (full pipeline), model management, settings
- Cover tab features:
- Character model download/management with series filtering and keyword search
- 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
- Karaoke separation (lead/backing vocals)
- 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
- Source constraint control (auto/off/on)
- Dual VC pipeline mode (current/official)
- Singing repair (official mode only)
- Real-time VC route status display
- Model management tab:
- Base model download (HuBERT, RMVPE)
- Mature DeEcho model download
- Model list table with refresh
- Settings tab:
- Device info display
- Backend selection (CUDA/ROCm/XPU/DirectML/MPS/CPU)
- Config save
Config: configs/config.json — device, F0 method, index rate, cover separator settings, path mappings
- Python 3.10, UTF-8, 4-space indent
snake_casefunctions/variables,PascalCaseclasses,UPPER_SNAKE_CASEconstants- User-facing text is bilingual Chinese/English
- Commit messages: short imperative subjects, Chinese/English mixed (e.g.
infer: fix CUDA OOM) - No automated test suite; verify changes by running one voice conversion + one cover through the UI
_official_rvc/is vendored upstream reference — don't modify unless syncing
configs/config.json— all runtime settingsinfer/cover_pipeline.py— orchestrates the full cover workflowinfer/pipeline.py— RVC v2 inference coreinfer/separator.py— Roformer/Demucs vocal separation wrapperstools/character_models.py— character model registry (117 entries) + download logictools/download_models.py— base model (HuBERT/RMVPE) + mature DeEcho downloaderlib/mixer.py— audio mixing with volume/reverbui/app.py— entire Gradio UI (~2000 lines)mcp/server.py+mcp/tools.py— MCP server integration for Claude CodeAI_RVC_Colab.ipynb— Google Colab notebook with full feature parityinstall.py— cross-platform installation script (Windows/Linux)
fairseqis pinned to0.12.2— HuBERT loading breaks on other versionsaudio-separatormust be installed with[gpu]extra for CUDA support- Roformer model auto-downloads on first use to
assets/separator_models/ - Gradio is pinned to
3.50.2; the UI code uses v3 API patterns (not v4) - Model weights (.pt, .pth) and audio files are gitignored — never commit them
- Path handling uses
pathlib.Pathfor cross-platform compatibility (Windows/Linux) - Virtual environment activation differs by platform:
Scripts/Activate.ps1(Windows) vsbin/activate(Linux) install.pyhas hardcoded Windows Python paths inPYTHON310_CANDIDATESbut falls back topy -3.10launcher- Platform detection uses
os.name == "nt"for Windows-specific logic (venv paths, etc.) - All core functionality is platform-agnostic; audio libraries work better on Linux
- Colab notebook (
AI_RVC_Colab.ipynb) provides full feature parity with Web UI