From b2a56527131f662333d6998b4f24f74d556f33d7 Mon Sep 17 00:00:00 2001 From: martin Date: Sat, 24 Jan 2026 22:30:22 +0700 Subject: [PATCH] update voxcpm docs --- README.md | 5 +++-- docs/api.md | 33 ++++++++++++++++++++++++++------- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 70af96e..2f91a14 100644 --- a/README.md +++ b/README.md @@ -146,13 +146,13 @@ curl -N -X POST "http://localhost:8000/api/v1/stt/transcribe/stream?engine=whisp **Batch Synthesis** ```bash -curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=coqui&text=Hello%20world&voice=en-US-1&speed=1.0" +curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world" ``` **Streaming Synthesis** ```bash -curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=coqui&text=Hello%20world" +curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=voxcpm&text=Hello%20world" ``` --- @@ -182,6 +182,7 @@ Detailed documentation is available in the `docs/` directory: | Engine | Backend | Status | Features | | :--- | :--- | :---: | :--- | +| **VoxCPM** | `voxcpm` | ✅ Ready | Zero-shot voice cloning, streaming, 24kHz | | **Coqui TTS** | `TTS` | 🚧 Planned | High-quality open source voices | | **OpenAI TTS** | OpenAI API | 🚧 Planned | Natural sounding commercial voices | diff --git a/docs/api.md b/docs/api.md index 31cd72e..28fd72e 100644 --- a/docs/api.md +++ b/docs/api.md @@ -248,7 +248,7 @@ Batch synthesis - convert text to speech audio. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| -| `engine` | string | Yes | Engine name (e.g., "coqui") | +| `engine` | string | Yes | Engine name (e.g., "voxcpm") | | `text` | string | Yes | Text to synthesize | | `voice` | string | No | Voice name/ID to use | | `speed` | float | No | Speech speed multiplier (0 < speed <= 3.0, default: 1.0) | @@ -257,7 +257,7 @@ Batch synthesis - convert text to speech audio. **Example:** ```bash -curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=coqui&text=Hello%20world&voice=en-US-1&speed=1.0" +curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world" ``` **Response:** @@ -307,17 +307,15 @@ data: {"audio_data": "", "sample_rate": 22050, "duration_seco **Example:** ```bash -curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=coqui&text=Hello%20world" +curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=voxcpm&text=Hello%20world" ``` **JavaScript Client Example:** ```javascript const params = new URLSearchParams({ - engine: 'coqui', - text: 'Hello, how are you today?', - voice: 'en-US-1', - speed: '1.0' + engine: 'voxcpm', + text: 'Hello, how are you today?' }); const eventSource = new EventSource( @@ -495,6 +493,27 @@ curl -X POST "http://localhost:8000/api/v1/stt/transcribe?engine=whisper&engine_ -F "file=@audio.wav" ``` +### VoxCPM (Text-to-Speech) + +Pass these via `engine_params` query parameter as JSON string. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `prompt_wav_path` | string | null | Path to reference audio for zero-shot voice cloning | +| `prompt_text` | string | null | Transcript of the reference audio (required with prompt_wav_path) | + +**Basic Example:** + +```bash +curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world" +``` + +**Voice Cloning Example:** + +```bash +curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world&engine_params={\"prompt_wav_path\":\"/path/to/reference.wav\",\"prompt_text\":\"This%20is%20the%20reference%20transcript\"}" +``` + --- ## Rate Limiting