Skip to content

Latest commit

 

History

History
126 lines (100 loc) · 5.66 KB

File metadata and controls

126 lines (100 loc) · 5.66 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

RVC v2 voice conversion & AI cover system. Users upload a song, the pipeline separates vocals from accompaniment (Mel-Band Roformer), converts the vocal timbre via RVC v2 (HuBERT + RMVPE + FAISS), then mixes the result back with the accompaniment.

Platform Support: Windows / Linux / WSL2 / Google Colab

Key Features:

  • AI song covers with automatic vocal separation and mixing
  • 117 downloadable character models
  • 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
  • Karaoke mode (lead/backing vocal separation)
  • 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
  • Dual VC pipeline (current implementation vs official RVC)
  • Multi-backend GPU support (CUDA, ROCm, XPU, DirectML, MPS)

Commands

# Activate venv (Windows)
.\venv310\Scripts\Activate.ps1

# Activate venv (Linux/WSL2)
source venv310/bin/activate

# Install dependencies
python install.py              # full install + launch
python install.py --check      # check only
python install.py --cpu        # CPU variant

# Run
python run.py                          # default: http://127.0.0.1:7860
python run.py --skip-check             # skip env/model validation
python run.py --host 0.0.0.0 --port 8080 --share

# Download base models (HuBERT, RMVPE)
python tools/download_models.py

# Download character models
python -c "from tools.character_models import download_character_model; download_character_model('rin')"

# Quick CUDA check
python -c "import torch; print(torch.cuda.is_available())"

# Colab
# Open AI_RVC_Colab.ipynb in Google Colab, set runtime to GPU (T4), run cells sequentially

Architecture

Entry: run.py → env check → model check → ui/app.py:launch()

Pipeline flow (infer/cover_pipeline.py:CoverPipeline.process):

  1. Vocal separation (infer/separator.py) — Roformer (default), Demucs, or UVR5
  2. RVC voice conversion (infer/pipeline.py) — HuBERT features → RMVPE F0 → RVC v2 inference with FAISS retrieval
  3. Mixing (lib/mixer.py) — volume adjust + reverb via pedalboard

Character model system (tools/character_models.py):

  • 117 downloadable character models from HuggingFace (trioskosmos/rvc_models)
  • Stored in assets/weights/characters/
  • Version notes (epochs, sample rate) extracted from .pth metadata and cached in _version_notes.json
  • Display name assembly: _get_display_name() appends (500 epochs·40k) style training info

UI (ui/app.py):

  • Gradio 3.50.2, single-file ~2000 lines
  • i18n via i18n/zh_CN.json, accessed through t(key, section) helper
  • Three main tabs: song cover (full pipeline), model management, settings
  • Cover tab features:
    • Character model download/management with series filtering and keyword search
    • 4 mixing presets (universal, vocal-focused, accompaniment-focused, live)
    • Karaoke separation (lead/backing vocals)
    • 4 VC preprocessing modes (auto, direct, uvr_deecho, legacy)
    • Source constraint control (auto/off/on)
    • Dual VC pipeline mode (current/official)
    • Singing repair (official mode only)
    • Real-time VC route status display
  • Model management tab:
    • Base model download (HuBERT, RMVPE)
    • Mature DeEcho model download
    • Model list table with refresh
  • Settings tab:
    • Device info display
    • Backend selection (CUDA/ROCm/XPU/DirectML/MPS/CPU)
    • Config save

Config: configs/config.json — device, F0 method, index rate, cover separator settings, path mappings

Key Conventions

  • Python 3.10, UTF-8, 4-space indent
  • snake_case functions/variables, PascalCase classes, UPPER_SNAKE_CASE constants
  • User-facing text is bilingual Chinese/English
  • Commit messages: short imperative subjects, Chinese/English mixed (e.g. infer: fix CUDA OOM)
  • No automated test suite; verify changes by running one voice conversion + one cover through the UI
  • _official_rvc/ is vendored upstream reference — don't modify unless syncing

Important Paths

  • configs/config.json — all runtime settings
  • infer/cover_pipeline.py — orchestrates the full cover workflow
  • infer/pipeline.py — RVC v2 inference core
  • infer/separator.py — Roformer/Demucs vocal separation wrappers
  • tools/character_models.py — character model registry (117 entries) + download logic
  • tools/download_models.py — base model (HuBERT/RMVPE) + mature DeEcho downloader
  • lib/mixer.py — audio mixing with volume/reverb
  • ui/app.py — entire Gradio UI (~2000 lines)
  • mcp/server.py + mcp/tools.py — MCP server integration for Claude Code
  • AI_RVC_Colab.ipynb — Google Colab notebook with full feature parity
  • install.py — cross-platform installation script (Windows/Linux)

Things to Watch

  • fairseq is pinned to 0.12.2 — HuBERT loading breaks on other versions
  • audio-separator must be installed with [gpu] extra for CUDA support
  • Roformer model auto-downloads on first use to assets/separator_models/
  • Gradio is pinned to 3.50.2; the UI code uses v3 API patterns (not v4)
  • Model weights (.pt, .pth) and audio files are gitignored — never commit them
  • Path handling uses pathlib.Path for cross-platform compatibility (Windows/Linux)
  • Virtual environment activation differs by platform: Scripts/Activate.ps1 (Windows) vs bin/activate (Linux)
  • install.py has hardcoded Windows Python paths in PYTHON310_CANDIDATES but falls back to py -3.10 launcher
  • Platform detection uses os.name == "nt" for Windows-specific logic (venv paths, etc.)
  • All core functionality is platform-agnostic; audio libraries work better on Linux
  • Colab notebook (AI_RVC_Colab.ipynb) provides full feature parity with Web UI