feat: add MiniMax Cloud TTS as a built-in engine by octo-patch · Pull Request #331 · jamiepine/voicebox

octo-patch · 2026-03-20T16:25:10Z

Summary

Add MiniMax as a cloud-based TTS engine alongside existing local engines (Qwen, LuxTTS, Chatterbox, TADA, Kokoro).

No model downloads needed — only requires a MINIMAX_API_KEY environment variable
Default model: speech-2.8-hd (high-quality, maximized timbre similarity)
12 preset voice IDs (e.g. English_Graceful_Lady, Deep_Voice_Man, Wise_Woman)
24kHz PCM audio output, compatible with existing audio pipeline

Changes

Backend (4 files modified, 2 new):

backend/backends/minimax_backend.py — New MiniMaxTTSBackend implementation
- Calls MiniMax TTS API (POST https://api.minimax.io/v1/t2a_v2)
- Uses stdlib urllib.request — zero new dependencies
- PCM format at 24kHz for direct numpy array conversion
backend/backends/__init__.py — Registered in TTS_ENGINES dict and factory function
backend/models.py — Added "minimax" to engine validation pattern
backend/routes/profiles.py — Added MiniMax voices to preset voice API endpoint

Frontend (5 files modified):

Engine selector dropdown, profile form, zod validation, TypeScript types, language support
Listed as preset-only engine (cloud-based, no voice cloning from reference audio)

Tests (1 new file):

backend/tests/test_minimax_backend.py — 16 unit tests covering:
- Backend lifecycle (load/unload/state)
- Voice prompt creation (preset mode)
- API payload verification with mocked HTTP
- Error handling (missing API key, API errors)
- Engine registration in factory

Test Results

All 38 tests pass (22 existing + 16 new), no regressions.

Integration tested against real MiniMax API — generates audio with correct sample rate and duration.

Test plan

Set MINIMAX_API_KEY environment variable
Create a profile with "Built-in voice" → "MiniMax Cloud TTS" engine
Select a preset voice (e.g. English_Graceful_Lady)
Generate speech — verify audio plays correctly
Run pytest backend/tests/test_minimax_backend.py — all 16 tests should pass

Summary by CodeRabbit

New Features
- Added MiniMax Cloud TTS engine option, a cloud-based text-to-speech service supporting 10 languages (English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian) without requiring local model downloads.

Add MiniMax as a cloud-based TTS engine alongside existing local engines (Qwen, LuxTTS, Chatterbox, TADA, Kokoro). MiniMax requires only an API key (MINIMAX_API_KEY) — no model downloads needed. Backend: - New MiniMaxTTSBackend in backend/backends/minimax_backend.py - Calls MiniMax TTS API (POST https://api.minimax.io/v1/t2a_v2) - Default model: speech-2.8-hd, 24kHz PCM output - 12 preset voice IDs (English_Graceful_Lady, Deep_Voice_Man, etc.) - Registered in TTS_ENGINES and backend factory - Preset voice API endpoint returns MiniMax voices Frontend: - Added to engine selector dropdown, profile form, and type definitions - Language support for 10 languages - Listed as preset-only engine (no voice cloning) Tests: - 16 unit tests covering backend lifecycle, voice prompts, API mocking, payload verification, error handling, and engine registration - All 38 tests pass (22 existing + 16 new) Co-Authored-By: Octopus <liyuan851277048@icloud.com>

coderabbitai · 2026-03-20T16:25:26Z

📝 Walkthrough

Walkthrough

This pull request adds a new MiniMax Cloud TTS engine to the application, extending frontend engine selectors, API type definitions, language support configurations, and backend infrastructure to support a cloud-based text-to-speech backend with preset voice support.

Changes

Cohort / File(s)	Summary
Frontend Engine Selectors `app/src/components/Generation/EngineModelSelector.tsx`, `app/src/components/VoiceProfiles/ProfileForm.tsx`	Added 'minimax' engine option to UI selectors and extended preset-only engine restrictions to include 'minimax' alongside 'kokoro'.
Frontend Type & Form Logic `app/src/lib/api/types.ts`, `app/src/lib/hooks/useGenerationForm.ts`	Extended `GenerationRequest.engine` type union to include 'minimax'; updated form submission handler to map 'minimax' engine to model name 'minimax-cloud-tts' with display name 'MiniMax Cloud TTS'.
Language & Schema Configuration `app/src/lib/constants/languages.ts`, `backend/models.py`	Added 'minimax' language support mapping (en, zh, ja, ko, de, fr, ru, pt, es, it) and extended API validation regex to permit 'minimax' engine value.
Backend Engine Registry & Implementation `backend/backends/__init__.py`, `backend/backends/minimax_backend.py`	Added 'minimax' to TTS engine registry; implemented complete `MiniMaxTTSBackend` class with HTTP API integration, preset-based voice prompts, and PCM hex audio decoding (210 lines).
Backend Voice Profile Routes `backend/routes/profiles.py`	Extended `list_preset_voices()` to recognize 'minimax' engine and dynamically return preset voices from `MINIMAX_VOICES` constant.
Backend Tests `backend/tests/test_minimax_backend.py`	Added comprehensive test suite (230 lines) covering backend initialization, API key validation, preset voice handling, audio synthesis mocking, error cases, and engine registration.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Browser)
    participant Frontend as Frontend App
    participant Server as Backend Server
    participant MiniMax as MiniMax API

    Client->>Frontend: Submit TTS generation request<br/>(engine: 'minimax', text, voice_prompt)
    Frontend->>Frontend: Validate form<br/>(generationSchema includes minimax)
    Frontend->>Frontend: Map engine to modelName<br/>('minimax' → 'minimax-cloud-tts')
    Frontend->>Server: POST /generate<br/>(GenerationRequest with engine: 'minimax')
    
    Server->>Server: Route to MiniMaxTTSBackend.generate()
    Server->>Server: Extract voice_id from voice_prompt<br/>(default: 'English_Graceful_Lady')
    Server->>Server: Construct JSON payload<br/>(model, text, voice_id, audio format)
    Server->>MiniMax: POST /t2a_v2<br/>(JSON request with TTS parameters)
    
    MiniMax->>MiniMax: Process TTS synthesis
    MiniMax-->>Server: HTTP Response<br/>(status_code, data.audio [hex PCM])
    
    Server->>Server: Validate response status
    Server->>Server: Decode hex PCM to int16<br/>Normalize to float32 [-1.0, 1.0]
    Server->>Server: Return (audio_array, 24000 Hz)
    
    Server-->>Frontend: Audio stream / file
    Frontend->>Client: Play/download audio

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: Kokoro 82M TTS engine + voice profile type system #325 — Adds a new TTS engine with overlapping frontend and backend file modifications (engine selectors, profile forms, type unions, language constants, and backend registry).
feat: Chatterbox Turbo engine + per-engine language lists #258 — Introduces a new TTS engine with code-level changes to the same engine infrastructure (type definitions, generation form hooks, and backends registry).
Backend refactor: modular architecture, style guide, tooling #285 — Refactors the backend/backends/init.py model and engine registry that this PR extends with the 'minimax' engine entry.

Poem

🐰 A fluffy rabbit whispers, "Hip-hip-hooray!
MiniMax voices now brighten the day,
No downloads to bother, just clouds up on high,
Ten languages singing, voices multiply!" 🎵✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.53% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding MiniMax Cloud TTS as a built-in engine to the application.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

app/src/components/VoiceProfiles/ProfileForm.tsx (1)

842-845: ⚠️ Potential issue | 🟠 Major

Reset selected voice when preset engine changes.

Line 844 updates the engine but keeps the previous selectedPresetVoiceId. With two engines now available, a stale voice ID can be submitted against the wrong engine.

💡 Proposed fix

-<Select
-  value={selectedPresetEngine}
-  onValueChange={setSelectedPresetEngine}
->
+<Select
+  value={selectedPresetEngine}
+  onValueChange={(engine) => {
+    setSelectedPresetEngine(engine);
+    setSelectedPresetVoiceId('');
+  }}
+>

Also applies to: 853-853

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 842 - 845,
When the preset engine selection changes, reset the selected preset voice to
avoid submitting a stale voice ID: update the onValueChange handler for the
engine Select (where selectedPresetEngine and setSelectedPresetEngine are used)
to also call setSelectedPresetVoiceId(null or undefined) so
selectedPresetVoiceId is cleared; do the same update for the other engine Select
instance (the second occurrence that currently mirrors lines ~853) to ensure
both engine changes clear the voice selection.

🧹 Nitpick comments (1)

backend/tests/test_minimax_backend.py (1)

189-206: Avoid hard-coded default voice ID in the assertion.

Use DEFAULT_VOICE_ID here to prevent brittle tests if the backend default changes.

Proposed tweak

 `@patch.dict`(os.environ, {"MINIMAX_API_KEY": "test-key"})
 `@patch`("urllib.request.urlopen")
 def test_generate_uses_default_voice_id(self, mock_urlopen):
     import asyncio
+    from backend.backends.minimax_backend import DEFAULT_VOICE_ID

     mock_resp = MagicMock()
     mock_resp.read.return_value = self._make_mock_response()
     mock_resp.__enter__ = lambda s: s
     mock_resp.__exit__ = MagicMock(return_value=False)
     mock_urlopen.return_value = mock_resp

     backend = self._make_backend()
     asyncio.get_event_loop().run_until_complete(
         backend.generate("test", {})
     )

     call_args = mock_urlopen.call_args
     request = call_args[0][0]
     payload = json.loads(request.data.decode("utf-8"))
-    self.assertEqual(payload["voice_setting"]["voice_id"], "English_Graceful_Lady")
+    self.assertEqual(payload["voice_setting"]["voice_id"], DEFAULT_VOICE_ID)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_minimax_backend.py` around lines 189 - 206, The test
test_generate_uses_default_voice_id currently asserts a hard-coded string
"English_Graceful_Lady"; update it to import and use DEFAULT_VOICE_ID so the
assertion checks payload["voice_setting"]["voice_id"] == DEFAULT_VOICE_ID.
Locate the test function test_generate_uses_default_voice_id and replace the
hard-coded expected value with the module-level constant DEFAULT_VOICE_ID
(imported from the backend module that defines it) to keep the test resilient to
default changes while still verifying backend.generate uses the default when
none is provided.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@app/src/components/VoiceProfiles/ProfileForm.tsx`:
- Around line 842-845: When the preset engine selection changes, reset the
selected preset voice to avoid submitting a stale voice ID: update the
onValueChange handler for the engine Select (where selectedPresetEngine and
setSelectedPresetEngine are used) to also call setSelectedPresetVoiceId(null or
undefined) so selectedPresetVoiceId is cleared; do the same update for the other
engine Select instance (the second occurrence that currently mirrors lines ~853)
to ensure both engine changes clear the voice selection.

---

Nitpick comments:
In `@backend/tests/test_minimax_backend.py`:
- Around line 189-206: The test test_generate_uses_default_voice_id currently
asserts a hard-coded string "English_Graceful_Lady"; update it to import and use
DEFAULT_VOICE_ID so the assertion checks payload["voice_setting"]["voice_id"] ==
DEFAULT_VOICE_ID. Locate the test function test_generate_uses_default_voice_id
and replace the hard-coded expected value with the module-level constant
DEFAULT_VOICE_ID (imported from the backend module that defines it) to keep the
test resilient to default changes while still verifying backend.generate uses
the default when none is provided.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c0b9975-446e-49a7-8b99-6f83422bc9ff

📥 Commits

Reviewing files that changed from the base of the PR and between d70b878 and c11d38e.

📒 Files selected for processing (10)

app/src/components/Generation/EngineModelSelector.tsx
app/src/components/VoiceProfiles/ProfileForm.tsx
app/src/lib/api/types.ts
app/src/lib/constants/languages.ts
app/src/lib/hooks/useGenerationForm.ts
backend/backends/__init__.py
backend/backends/minimax_backend.py
backend/models.py
backend/routes/profiles.py
backend/tests/test_minimax_backend.py

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax Cloud TTS as a built-in engine#331

feat: add MiniMax Cloud TTS as a built-in engine#331
octo-patch wants to merge 1 commit intojamiepine:mainfrom
octo-patch:feature/add-minimax-tts

octo-patch commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Mar 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Results

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Mar 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 20, 2026 •

edited

Loading