feat(tts): implement CosyVoice3 engine with HTTP client and dynamic p… by phonk2682 · Pull Request #17 · minhsaco99/VoiceCore

phonk2682 · 2026-02-13T10:07:45Z

Description

Implemented the CosyVoice3 TTS engine integration using an HTTP client approach to communicate with an external CosyVoice3 FastAPI server. This implementation supports:

Zero-zero voice cloning using reference audio.
Dynamic prompt configuration via API engine_params (prompt_wav_path and prompt_text).
Streaming synthesis support.
Configurable default voices (mapped to fin_demo samples).

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
New Engine (STT/TTS provider)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactor (non-breaking code cleanup or optimization)
Documentation update
Performance improvement

Checklist

I have read the CONTRIBUTING guide
My code follows the project's code style (make format)
Linting passes (make lint)
Tests pass (make test)
Documentation updated (if needed)
No sensitive information (API keys, secrets) included

Related Issues

Closes #

Testing & Verification

Automated Tests

Unit tests added/updated
All existing tests pass

Manual Verification (if applicable)

Verified functionality by running the CosyVoice3 server locally and making curl requests to the VoiceCore API:

Checked default voice synthesis.
Verified zero-shot cloning with dynamic prompt_wav_path provided in engine_params.
Validated streaming response.

API Endpoints Tested (if applicable)

Batch endpoint (POST /api/v1/stt/transcribe or /tts/synthesize)
SSE streaming (POST .../stream)
WebSocket (WS .../ws)

Engine-Specific Tests (if applicable)

Engine type: [TTS]
Provider: [CosyVoice3 (Self-hosted)]
Model: [Fun-CosyVoice3-0.5B]
Device tested: [cuda]

Security Impact

No security implications
Security impact (please describe below)

…rompting

Copilot

Pull request overview

Adds a new CosyVoice3 TTS engine implementation that talks to an external FastAPI server via an HTTP client, including configuration models and unit tests, plus dependency/config updates to wire it into the project.

Changes:

Introduce CosyVoice3Engine (HTTP client, batch + streaming) and CosyVoice3Config / VoiceConfig.
Add unit tests covering config defaults, lifecycle, voice resolution, batch synthesis, and streaming behavior with mocked HTTP.
Update pyproject.toml optional dependency groups and engines.yaml engine definitions.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`app/engines/tts/cosyvoice3/engine.py`	Implements CosyVoice3 HTTP client TTS engine (batch + streaming, voice resolution, metrics).
`app/engines/tts/cosyvoice3/config.py`	Adds Pydantic config models for CosyVoice3 and per-voice prompt settings.
`app/engines/tts/cosyvoice3/__init__.py`	Exposes CosyVoice3 config/engine symbols for import.
`tests/unit/engines/tts/cosyvoice3/test_cosyvoice3_engine.py`	Adds unit tests for the new engine using mocked `httpx` streaming responses.
`tests/unit/engines/tts/cosyvoice3/__init__.py`	Marks the new test package.
`pyproject.toml`	Adds `cosyvoice3` optional dependency group (httpx) and also adds a `qwen3-asr` group.
`engines.yaml`	Adds CosyVoice3 TTS config block and also enables a `qwen3-asr` STT block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T10:18:13Z

engines.yaml

+  # Qwen3-ASR: State-of-the-art multilingual ASR (52 languages)
+  # Install: uv sync --group qwen3-asr
+  qwen3-asr:
+    enabled: true


qwen3-asr is enabled and points to app.engines.stt.qwen3asr.engine.Qwen3ASREngine, but there is no such module/class in the repository. With the default config, engine loading will fail at startup. Either add the missing engine implementation, or set enabled: false / remove this block until the engine exists.

Suggested change

enabled: true

enabled: false # Disabled until Qwen3ASREngine is implemented

Copilot · 2026-02-13T10:18:14Z

engines.yaml

+      default_voice: "fin_default"
+      voices:
+        default:
+          prompt_wav_path: "/path/to/voice_sample.wav"
+          prompt_text: "Transcript of the voice sample audio"


default_voice is set to fin_default, but the configured voices map only defines default. As written, requests without an explicit voice will raise Voice 'fin_default' not found. Align default_voice with an existing key (or rename the voice key).

Copilot · 2026-02-13T10:18:14Z