CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

RVC v2 voice conversion & AI cover system. Users upload a song, the pipeline separates vocals from accompaniment (Mel-Band Roformer), converts the vocal timbre via RVC v2 (HuBERT + RMVPE + FAISS), then mixes the result back with the accompaniment.

Platform Support: Windows / Linux / WSL2 / Google Colab

Key Features:

AI song covers with automatic vocal separation and mixing
117 downloadable character models
4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
Karaoke mode (lead/backing vocal separation)
4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
Dual VC pipeline (current implementation vs official RVC)
Multi-backend GPU support (CUDA, ROCm, XPU, DirectML, MPS)

Commands

# Activate venv (Windows)
.\venv310\Scripts\Activate.ps1

# Activate venv (Linux/WSL2)
source venv310/bin/activate

# Install dependencies
python install.py              # full install + launch
python install.py --check      # check only
python install.py --cpu        # CPU variant

# Run
python run.py                          # default: http://127.0.0.1:7860
python run.py --skip-check             # skip env/model validation
python run.py --host 0.0.0.0 --port 8080 --share

# Download base models (HuBERT, RMVPE)
python tools/download_models.py

# Download character models
python -c "from tools.character_models import download_character_model; download_character_model('rin')"

# Quick CUDA check
python -c "import torch; print(torch.cuda.is_available())"

# Colab
# Open AI_RVC_Colab.ipynb in Google Colab, set runtime to GPU (T4), run cells sequentially

Architecture

Entry: run.py → env check → model check → ui/app.py:launch()

Pipeline flow (infer/cover_pipeline.py:CoverPipeline.process):

Vocal separation (infer/separator.py) — Roformer (default), Demucs, or UVR5
RVC voice conversion (infer/pipeline.py) — HuBERT features → RMVPE F0 → RVC v2 inference with FAISS retrieval
Mixing (lib/mixer.py) — volume adjust + reverb via pedalboard

Character model system (tools/character_models.py):

117 downloadable character models from HuggingFace (trioskosmos/rvc_models)
Stored in assets/weights/characters/
Version notes (epochs, sample rate) extracted from .pth metadata and cached in _version_notes.json
Display name assembly: _get_display_name() appends (500 epochs·40k) style training info

UI (ui/app.py):

Gradio 3.50.2, single-file ~2000 lines
i18n via i18n/zh_CN.json, accessed through t(key, section) helper
Three main tabs: song cover (full pipeline), model management, settings
Cover tab features:
- Character model download/management with series filtering and keyword search
- 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
- Karaoke separation (lead/backing vocals)
- 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
- Source constraint control (auto/off/on)
- Dual VC pipeline mode (current/official)
- Singing repair (official mode only)
- Real-time VC route status display
Model management tab:
- Base model download (HuBERT, RMVPE)
- Mature DeEcho model download
- Model list table with refresh
Settings tab:
- Device info display
- Backend selection (CUDA/ROCm/XPU/DirectML/MPS/CPU)
- Config save

Config: configs/config.json — device, F0 method, index rate, cover separator settings, path mappings

Key Conventions

Python 3.10, UTF-8, 4-space indent
snake_case functions/variables, PascalCase classes, UPPER_SNAKE_CASE constants
User-facing text is bilingual Chinese/English
Commit messages: short imperative subjects, Chinese/English mixed (e.g. infer: fix CUDA OOM)
No automated test suite; verify changes by running one voice conversion + one cover through the UI
_official_rvc/ is vendored upstream reference — don't modify unless syncing

Important Paths

configs/config.json — all runtime settings
infer/cover_pipeline.py — orchestrates the full cover workflow
infer/pipeline.py — RVC v2 inference core
infer/separator.py — Roformer/Demucs vocal separation wrappers
tools/character_models.py — character model registry (117 entries) + download logic
tools/download_models.py — base model (HuBERT/RMVPE) + mature DeEcho downloader
lib/mixer.py — audio mixing with volume/reverb
ui/app.py — entire Gradio UI (~2000 lines)
mcp/server.py + mcp/tools.py — MCP server integration for Claude Code
AI_RVC_Colab.ipynb — Google Colab notebook with full feature parity
install.py — cross-platform installation script (Windows/Linux)

Things to Watch

fairseq is pinned to 0.12.2 — HuBERT loading breaks on other versions
audio-separator must be installed with [gpu] extra for CUDA support
Roformer model auto-downloads on first use to assets/separator_models/
Gradio is pinned to 3.50.2; the UI code uses v3 API patterns (not v4)
Model weights (.pt, .pth) and audio files are gitignored — never commit them
Path handling uses pathlib.Path for cross-platform compatibility (Windows/Linux)
Virtual environment activation differs by platform: Scripts/Activate.ps1 (Windows) vs bin/activate (Linux)
install.py has hardcoded Windows Python paths in PYTHON310_CANDIDATES but falls back to py -3.10 launcher
Platform detection uses os.name == "nt" for Windows-specific logic (venv paths, etc.)
All core functionality is platform-agnostic; audio libraries work better on Linux
Colab notebook (AI_RVC_Colab.ipynb) provides full feature parity with Web UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Commands

Architecture

Key Conventions

Important Paths

Things to Watch

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Commands

Architecture

Key Conventions

Important Paths

Things to Watch