Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,13 +146,13 @@ curl -N -X POST "http://localhost:8000/api/v1/stt/transcribe/stream?engine=whisp
**Batch Synthesis**

```bash
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=coqui&text=Hello%20world&voice=en-US-1&speed=1.0"
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world"
```

**Streaming Synthesis**

```bash
curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=coqui&text=Hello%20world"
curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=voxcpm&text=Hello%20world"
```

---
Expand Down Expand Up @@ -182,6 +182,7 @@ Detailed documentation is available in the `docs/` directory:

| Engine | Backend | Status | Features |
| :--- | :--- | :---: | :--- |
| **VoxCPM** | `voxcpm` | ✅ Ready | Zero-shot voice cloning, streaming, 24kHz |
| **Coqui TTS** | `TTS` | 🚧 Planned | High-quality open source voices |
| **OpenAI TTS** | OpenAI API | 🚧 Planned | Natural sounding commercial voices |

Expand Down
33 changes: 26 additions & 7 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ Batch synthesis - convert text to speech audio.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `engine` | string | Yes | Engine name (e.g., "coqui") |
| `engine` | string | Yes | Engine name (e.g., "voxcpm") |
| `text` | string | Yes | Text to synthesize |
| `voice` | string | No | Voice name/ID to use |
| `speed` | float | No | Speech speed multiplier (0 < speed <= 3.0, default: 1.0) |
Expand All @@ -257,7 +257,7 @@ Batch synthesis - convert text to speech audio.
**Example:**

```bash
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=coqui&text=Hello%20world&voice=en-US-1&speed=1.0"
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world"
```

**Response:**
Expand Down Expand Up @@ -307,17 +307,15 @@ data: {"audio_data": "<base64-full-audio>", "sample_rate": 22050, "duration_seco
**Example:**

```bash
curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=coqui&text=Hello%20world"
curl -N -X POST "http://localhost:8000/api/v1/tts/synthesize/stream?engine=voxcpm&text=Hello%20world"
```

**JavaScript Client Example:**

```javascript
const params = new URLSearchParams({
engine: 'coqui',
text: 'Hello, how are you today?',
voice: 'en-US-1',
speed: '1.0'
engine: 'voxcpm',
text: 'Hello, how are you today?'
});

const eventSource = new EventSource(
Expand Down Expand Up @@ -495,6 +493,27 @@ curl -X POST "http://localhost:8000/api/v1/stt/transcribe?engine=whisper&engine_
-F "file=@audio.wav"
```

### VoxCPM (Text-to-Speech)

Pass these via `engine_params` query parameter as JSON string.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt_wav_path` | string | null | Path to reference audio for zero-shot voice cloning |
| `prompt_text` | string | null | Transcript of the reference audio (required with prompt_wav_path) |

**Basic Example:**

```bash
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world"
```

**Voice Cloning Example:**

```bash
curl -X POST "http://localhost:8000/api/v1/tts/synthesize?engine=voxcpm&text=Hello%20world&engine_params={\"prompt_wav_path\":\"/path/to/reference.wav\",\"prompt_text\":\"This%20is%20the%20reference%20transcript\"}"
```

---

## Rate Limiting
Expand Down