Feature/add voxcpm engine by phonk2682 · Pull Request #9 · minhsaco99/VoiceCore

phonk2682 · 2026-01-22T07:57:23Z

Description

Add VoxCPM TTS engine integration with voice cloning support. This PR introduces a new TTS provider (VoxCPM) and updates the TTS API to support audio file uploads for zero-shot voice cloning.

Key changes:

Add VoxCPMEngine implementation with batch and streaming synthesis
Update TTS router to accept audio file uploads via multipart form
Add speaker_wav parameter to base TTS engine interface
Add base64 serialization for audio data in API responses
Refactor audio upload validation in deps.py

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
New Engine (STT/TTS provider)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking code cleanup or optimization)
Documentation update
Performance improvement

Checklist

I have read the CONTRIBUTING guide
My code follows the project's code style (make format)
Linting passes (make lint)
Tests pass (make test)
Documentation updated (if needed)
No sensitive information (API keys, secrets) included

Related Issues

Closes #

Testing & Verification

Automated Tests

Unit tests added/updated
All existing tests pass

Manual Verification (if applicable)

Tested TTS synthesis via API with VoxCPM engine using reference audio for voice cloning. Verified both batch and streaming endpoints return valid audio output.

API Endpoints Tested (if applicable)

Batch endpoint (POST /api/v1/tts/synthesize)
SSE streaming (POST /api/v1/tts/synthesize/stream)
WebSocket (WS /api/v1/tts/synthesize/ws)

Engine-Specific Tests (if applicable)

Engine type: TTS
Provider: VoxCPM (OpenBMB)
Model: openbmb/VoxCPM0.5
Device tested: cuda

Security Impact

No security implications
Security impact (please describe below)

- Add speaker_wav parameter to BaseTTSEngine.synthesize() and synthesize_stream() - Add base64 field_serializer for audio_data in TTSChunk and TTSResponse - Refactor validate_audio_upload to support both required and optional modes

- Add VoxCPMEngine with synthesize and synthesize_stream methods - Add VoxCPMConfig for engine configuration - Implement torchaudio.load monkey patch for soundfile fallback

- Change synthesize endpoints from TTSRequest body to Query/Form params - Add audio file upload support for voice cloning - Add validation: require either audio or voice parameter - Add prompt_text parameter for VoxCPM

- Add voxcpm dependency group to pyproject.toml

- Add VoxCPM TTS engine configuration - Use openbmb/VoxCPM0.5 as default model (HuggingFace)

…gine

Copilot

Pull request overview

This PR adds VoxCPM TTS engine integration with zero-shot voice cloning capabilities. The implementation includes batch and streaming synthesis modes, audio file upload support for voice cloning, and updates to the base TTS interface to support reference audio.

Changes:

Add VoxCPMEngine with support for batch and streaming synthesis
Update TTS API endpoints to accept multipart form data with audio file uploads
Add base64 serialization for audio data in API responses
Extend base TTS engine interface with speaker_wav parameter

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
pyproject.toml	Add voxcpm dependency to optional dependencies
engines.yaml	Configure VoxCPM engine with model and device settings
app/models/engine.py	Add base64 serialization for audio data in TTSChunk and TTSResponse
app/engines/base.py	Add speaker_wav parameter to TTS engine interface methods
app/engines/tts/voxcpm/config.py	Define VoxCPM-specific configuration model
app/engines/tts/voxcpm/engine.py	Implement VoxCPM TTS engine with batch and streaming synthesis
app/api/deps.py	Refactor audio upload validation to support optional uploads
app/api/routers/tts.py	Implement TTS endpoints with multipart form support and voice cloning

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T08:06:36Z

app/engines/tts/voxcpm/engine.py

+# This acts as a fallback/fix for environments where torchaudio 2.10 tries to use broken torchcodec
+def safe_load(filepath, **kwargs):
+    try:
+        data, sampler_rate = sf.read(filepath)


Variable name "sampler_rate" should be "sample_rate" for consistency with the rest of the codebase and standard audio processing terminology.

Copilot · 2026-01-22T08:06:36Z

app/engines/tts/voxcpm/engine.py

+             # Create temp file, close it so other processes can open it if needed
+            fd, temp_wav_path = tempfile.mkstemp(suffix=".wav")
+            os.write(fd, speaker_wav)
+            os.close(fd)


There is a potential resource leak in the temporary file handling. If an exception occurs between creating the temp file and writing to it, or if os.write() fails, the file descriptor may remain open and the temporary file may not be cleaned up. Use a more robust pattern with try-finally or a context manager to ensure the file descriptor is always closed properly.

Suggested change

# Create temp file, close it so other processes can open it if needed

fd, temp_wav_path = tempfile.mkstemp(suffix=".wav")

os.write(fd, speaker_wav)

os.close(fd)

# Create temp file, ensure file descriptor is always closed

fd = None

try:

fd, temp_wav_path = tempfile.mkstemp(suffix=".wav")

os.write(fd, speaker_wav)

finally:

if fd is not None:

try:

os.close(fd)

except OSError:

# Best-effort close; log if needed by surrounding infrastructure

pass

Copilot · 2026-01-22T08:06:36Z

app/engines/tts/voxcpm/engine.py

+
+

Unnecessary blank lines. Remove the extra whitespace for better code readability and consistency.

Suggested change

Copilot · 2026-01-22T08:06:37Z

app/api/routers/tts.py

+async def synthesize_text(
+    text: str = Query(..., description="Text to synthesize"),
+    prompt_text: str | None = Query(None, description="Text content of the reference audio (if provided)"),
+    voice: str | None = Form(None, description="Voice ID to use"),
+    audio: bytes | None = Depends(get_optional_audio_upload),
+    speed: float = Query(1.0, description="Speed multiplier"),
+    engine_params: str | None = Query(None, description="JSON engine parameters"),
+    tts_engine: BaseTTSEngine = Depends(get_tts_engine),


Mixing Form and Query parameters in a single endpoint can cause issues with FastAPI. The 'voice' parameter is declared as Form but 'text' and other parameters are Query. This inconsistency may lead to unexpected behavior. Consider using all Query parameters or restructuring to use a proper multipart/form-data request body with all form fields.

Copilot · 2026-01-22T08:06:37Z

app/api/routers/tts.py

+    # Validation: Must have either audio (speaker reference) or voice/default
+    # Actually, some engines might have defaults, but user requested explicit "one of them must be present" logic?
+    # User said: "Khi user gửi request thì bắt buộc 1 trong 2 phải có" (When user sends request, must have 1 of 2)
+
+    if not audio and not voice:
+        # Check if engine has a default voice? 
+        # But user rule is strict: "Must have 1 of 2"
+        raise HTTPException(
+            400, "Either 'audio' file or 'voice' ID must be provided"
+        )


The validation logic requires either audio or voice to be provided, but this may be too restrictive. Some TTS engines might have default voices configured in their settings that don't require explicit voice parameters. Consider checking if the engine has a default voice before raising this error, or document this requirement more clearly in the API specification.

Suggested change

# Validation: Must have either audio (speaker reference) or voice/default

# Actually, some engines might have defaults, but user requested explicit "one of them must be present" logic?

# User said: "Khi user gửi request thì bắt buộc 1 trong 2 phải có" (When user sends request, must have 1 of 2)

if not audio and not voice:

# Check if engine has a default voice?

# But user rule is strict: "Must have 1 of 2"

raise HTTPException(

400, "Either 'audio' file or 'voice' ID must be provided"

)

# Validation: Prefer explicit audio (speaker reference) or voice ID when provided.

# However, some engines may have a configured default voice and do not require either.

# If the engine does not expose a default voice, enforce that one of the two must be present.

if not audio and not voice:

# Allow engines that define a default voice attribute to proceed without explicit voice/audio.

default_voice = getattr(tts_engine, "default_voice", None)

if default_voice is None:

raise HTTPException(

400, "Either 'audio' file or 'voice' ID must be provided"

)

Copilot · 2026-01-22T08:06:40Z

app/engines/tts/voxcpm/engine.py

+            duration_seconds = len(wav) / self._model.tts_model.sample_rate
+
+        except Exception as e:
+            raise SynthesisError(f"Synthesis failed: {e}") from e


The error message exposes internal exception details which could leak sensitive information about the system implementation. Consider providing a more generic error message to the user while logging the detailed exception internally.

Copilot · 2026-01-22T08:06:40Z

pyproject.toml


 # TTS Models
+voxcpm = [
+    "voxcpm>=0.0.1", # Use latest available


The version constraint "voxcpm>=0.0.1" is too permissive and may pull in incompatible future versions. Consider specifying a more restrictive version range to ensure compatibility and prevent breaking changes from being automatically installed.

Suggested change

"voxcpm>=0.0.1", # Use latest available

"voxcpm>=0.0.1,<0.1.0", # Constrain to tested minor range

Copilot · 2026-01-22T08:06:41Z

pyproject.toml

+
+

Unnecessary blank lines. Remove for better code consistency.

Suggested change

Copilot · 2026-01-22T08:06:41Z

app/engines/tts/voxcpm/engine.py

+        self._model = None
+


The cleanup method only sets the model to None without explicitly releasing GPU memory or other resources. If the model uses CUDA tensors, consider explicitly calling torch.cuda.empty_cache() or moving the model to CPU before deletion to ensure proper resource cleanup.

Suggested change

self._model = None

# Explicitly release model resources, especially GPU memory if used

if self._model is not None:

model = self._model

# Move model to CPU if possible to free GPU memory before dropping reference

if hasattr(model, "to"):

try:

model.to("cpu")

except Exception:

# Best-effort; ignore failures during cleanup

pass

# Drop local reference

del model

self._model = None

# Clear CUDA cache if CUDA is available to ensure GPU memory is released

if torch.cuda.is_available():

try:

torch.cuda.empty_cache()

except Exception:

# Ignore CUDA cache cleanup errors during shutdown

pass

Copilot · 2026-01-22T08:06:41Z

app/api/routers/tts.py

+
+        await websocket.close()
+
+    except WebSocketDisconnect:


'except' clause does nothing but pass and there is no explanatory comment.

…gine

…event loop on Python 3.12

phonk2682 added 6 commits January 22, 2026 07:35

feat(tts): add VoxCPM engine implementation

5c999f9

- Add VoxCPMEngine with synthesize and synthesize_stream methods - Add VoxCPMConfig for engine configuration - Implement torchaudio.load monkey patch for soundfile fallback

feat(api): update TTS router for voice cloning support

62fa69f

- Change synthesize endpoints from TTSRequest body to Query/Form params - Add audio file upload support for voice cloning - Add validation: require either audio or voice parameter - Add prompt_text parameter for VoxCPM

chore: add VoxCPM dependency group

1a78c20

- Add voxcpm dependency group to pyproject.toml

chore: configure VoxCPM engine with public model path

d11ae14

- Add VoxCPM TTS engine configuration - Use openbmb/VoxCPM0.5 as default model (HuggingFace)

Merge remote-tracking branch 'origin/main' into feature/add_voxcpm_en…

3a93d0a

…gine

Copilot AI review requested due to automatic review settings January 22, 2026 07:57

Copilot started reviewing on behalf of phonk2682 January 22, 2026 07:57 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

phonk2682 added 5 commits January 22, 2026 09:45

tests(voxcpm): rename tests; lint fixes for whitespace and unused vars

93726de

chore: remove local debug/manual files from repo and ignore them

d4e0c0d

Merge remote-tracking branch 'origin/main' into feature/add_voxcpm_en…

0e13cf6

…gine

Merge remote-tracking branch 'origin/main' into feature/add_voxcpm_en…

42fbd44

…gine

tests(voxcpm): use asyncio.run() in edge-case tests to avoid missing …

9c39144

…event loop on Python 3.12

phonk2682 closed this Jan 23, 2026

phonk2682 deleted the feature/add_voxcpm_engine branch January 23, 2026 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/add voxcpm engine#9

Feature/add voxcpm engine#9
phonk2682 wants to merge 11 commits intomainfrom
feature/add_voxcpm_engine

phonk2682 commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"voxcpm>=0.0.1", # Use latest available
	"voxcpm>=0.0.1,<0.1.0", # Constrain to tested minor range

-        self._model = None
+        # Explicitly release model resources, especially GPU memory if used
+        if self._model is not None:
+            model = self._model
+            # Move model to CPU if possible to free GPU memory before dropping reference
+            if hasattr(model, "to"):
+                try:
+                    model.to("cpu")
+                except Exception:
+                    # Best-effort; ignore failures during cleanup
+                    pass
+            # Drop local reference
+            del model
+            self._model = None
+        # Clear CUDA cache if CUDA is available to ensure GPU memory is released
+        if torch.cuda.is_available():
+            try:
+                torch.cuda.empty_cache()
+            except Exception:
+                # Ignore CUDA cache cleanup errors during shutdown
+                pass

Conversation

phonk2682 commented Jan 22, 2026

Description

Type of Change

Checklist

Related Issues

Testing & Verification

Automated Tests

Manual Verification (if applicable)

API Endpoints Tested (if applicable)

Engine-Specific Tests (if applicable)

Security Impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants