Skip to content

feat: add MiniMax Cloud TTS as a built-in engine#331

Open
octo-patch wants to merge 1 commit intojamiepine:mainfrom
octo-patch:feature/add-minimax-tts
Open

feat: add MiniMax Cloud TTS as a built-in engine#331
octo-patch wants to merge 1 commit intojamiepine:mainfrom
octo-patch:feature/add-minimax-tts

Conversation

@octo-patch
Copy link

@octo-patch octo-patch commented Mar 20, 2026

Summary

Add MiniMax as a cloud-based TTS engine alongside existing local engines (Qwen, LuxTTS, Chatterbox, TADA, Kokoro).

  • No model downloads needed — only requires a MINIMAX_API_KEY environment variable
  • Default model: speech-2.8-hd (high-quality, maximized timbre similarity)
  • 12 preset voice IDs (e.g. English_Graceful_Lady, Deep_Voice_Man, Wise_Woman)
  • 24kHz PCM audio output, compatible with existing audio pipeline

Changes

Backend (4 files modified, 2 new):

  • backend/backends/minimax_backend.py — New MiniMaxTTSBackend implementation
    • Calls MiniMax TTS API (POST https://api.minimax.io/v1/t2a_v2)
    • Uses stdlib urllib.request — zero new dependencies
    • PCM format at 24kHz for direct numpy array conversion
  • backend/backends/__init__.py — Registered in TTS_ENGINES dict and factory function
  • backend/models.py — Added "minimax" to engine validation pattern
  • backend/routes/profiles.py — Added MiniMax voices to preset voice API endpoint

Frontend (5 files modified):

  • Engine selector dropdown, profile form, zod validation, TypeScript types, language support
  • Listed as preset-only engine (cloud-based, no voice cloning from reference audio)

Tests (1 new file):

  • backend/tests/test_minimax_backend.py — 16 unit tests covering:
    • Backend lifecycle (load/unload/state)
    • Voice prompt creation (preset mode)
    • API payload verification with mocked HTTP
    • Error handling (missing API key, API errors)
    • Engine registration in factory

Test Results

All 38 tests pass (22 existing + 16 new), no regressions.

Integration tested against real MiniMax API — generates audio with correct sample rate and duration.

Test plan

  • Set MINIMAX_API_KEY environment variable
  • Create a profile with "Built-in voice" → "MiniMax Cloud TTS" engine
  • Select a preset voice (e.g. English_Graceful_Lady)
  • Generate speech — verify audio plays correctly
  • Run pytest backend/tests/test_minimax_backend.py — all 16 tests should pass

Summary by CodeRabbit

  • New Features
    • Added MiniMax Cloud TTS engine option, a cloud-based text-to-speech service supporting 10 languages (English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian) without requiring local model downloads.

Add MiniMax as a cloud-based TTS engine alongside existing local engines
(Qwen, LuxTTS, Chatterbox, TADA, Kokoro). MiniMax requires only an API
key (MINIMAX_API_KEY) — no model downloads needed.

Backend:
- New MiniMaxTTSBackend in backend/backends/minimax_backend.py
- Calls MiniMax TTS API (POST https://api.minimax.io/v1/t2a_v2)
- Default model: speech-2.8-hd, 24kHz PCM output
- 12 preset voice IDs (English_Graceful_Lady, Deep_Voice_Man, etc.)
- Registered in TTS_ENGINES and backend factory
- Preset voice API endpoint returns MiniMax voices

Frontend:
- Added to engine selector dropdown, profile form, and type definitions
- Language support for 10 languages
- Listed as preset-only engine (no voice cloning)

Tests:
- 16 unit tests covering backend lifecycle, voice prompts, API mocking,
  payload verification, error handling, and engine registration
- All 38 tests pass (22 existing + 16 new)

Co-Authored-By: Octopus <liyuan851277048@icloud.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This pull request adds a new MiniMax Cloud TTS engine to the application, extending frontend engine selectors, API type definitions, language support configurations, and backend infrastructure to support a cloud-based text-to-speech backend with preset voice support.

Changes

Cohort / File(s) Summary
Frontend Engine Selectors
app/src/components/Generation/EngineModelSelector.tsx, app/src/components/VoiceProfiles/ProfileForm.tsx
Added 'minimax' engine option to UI selectors and extended preset-only engine restrictions to include 'minimax' alongside 'kokoro'.
Frontend Type & Form Logic
app/src/lib/api/types.ts, app/src/lib/hooks/useGenerationForm.ts
Extended GenerationRequest.engine type union to include 'minimax'; updated form submission handler to map 'minimax' engine to model name 'minimax-cloud-tts' with display name 'MiniMax Cloud TTS'.
Language & Schema Configuration
app/src/lib/constants/languages.ts, backend/models.py
Added 'minimax' language support mapping (en, zh, ja, ko, de, fr, ru, pt, es, it) and extended API validation regex to permit 'minimax' engine value.
Backend Engine Registry & Implementation
backend/backends/__init__.py, backend/backends/minimax_backend.py
Added 'minimax' to TTS engine registry; implemented complete MiniMaxTTSBackend class with HTTP API integration, preset-based voice prompts, and PCM hex audio decoding (210 lines).
Backend Voice Profile Routes
backend/routes/profiles.py
Extended list_preset_voices() to recognize 'minimax' engine and dynamically return preset voices from MINIMAX_VOICES constant.
Backend Tests
backend/tests/test_minimax_backend.py
Added comprehensive test suite (230 lines) covering backend initialization, API key validation, preset voice handling, audio synthesis mocking, error cases, and engine registration.

Sequence Diagram

sequenceDiagram
    participant Client as Client (Browser)
    participant Frontend as Frontend App
    participant Server as Backend Server
    participant MiniMax as MiniMax API

    Client->>Frontend: Submit TTS generation request<br/>(engine: 'minimax', text, voice_prompt)
    Frontend->>Frontend: Validate form<br/>(generationSchema includes minimax)
    Frontend->>Frontend: Map engine to modelName<br/>('minimax' → 'minimax-cloud-tts')
    Frontend->>Server: POST /generate<br/>(GenerationRequest with engine: 'minimax')
    
    Server->>Server: Route to MiniMaxTTSBackend.generate()
    Server->>Server: Extract voice_id from voice_prompt<br/>(default: 'English_Graceful_Lady')
    Server->>Server: Construct JSON payload<br/>(model, text, voice_id, audio format)
    Server->>MiniMax: POST /t2a_v2<br/>(JSON request with TTS parameters)
    
    MiniMax->>MiniMax: Process TTS synthesis
    MiniMax-->>Server: HTTP Response<br/>(status_code, data.audio [hex PCM])
    
    Server->>Server: Validate response status
    Server->>Server: Decode hex PCM to int16<br/>Normalize to float32 [-1.0, 1.0]
    Server->>Server: Return (audio_array, 24000 Hz)
    
    Server-->>Frontend: Audio stream / file
    Frontend->>Client: Play/download audio
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 A fluffy rabbit whispers, "Hip-hip-hooray!
MiniMax voices now brighten the day,
No downloads to bother, just clouds up on high,
Ten languages singing, voices multiply!" 🎵✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.53% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding MiniMax Cloud TTS as a built-in engine to the application.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/src/components/VoiceProfiles/ProfileForm.tsx (1)

842-845: ⚠️ Potential issue | 🟠 Major

Reset selected voice when preset engine changes.

Line 844 updates the engine but keeps the previous selectedPresetVoiceId. With two engines now available, a stale voice ID can be submitted against the wrong engine.

💡 Proposed fix
-<Select
-  value={selectedPresetEngine}
-  onValueChange={setSelectedPresetEngine}
->
+<Select
+  value={selectedPresetEngine}
+  onValueChange={(engine) => {
+    setSelectedPresetEngine(engine);
+    setSelectedPresetVoiceId('');
+  }}
+>

Also applies to: 853-853

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 842 - 845,
When the preset engine selection changes, reset the selected preset voice to
avoid submitting a stale voice ID: update the onValueChange handler for the
engine Select (where selectedPresetEngine and setSelectedPresetEngine are used)
to also call setSelectedPresetVoiceId(null or undefined) so
selectedPresetVoiceId is cleared; do the same update for the other engine Select
instance (the second occurrence that currently mirrors lines ~853) to ensure
both engine changes clear the voice selection.
🧹 Nitpick comments (1)
backend/tests/test_minimax_backend.py (1)

189-206: Avoid hard-coded default voice ID in the assertion.

Use DEFAULT_VOICE_ID here to prevent brittle tests if the backend default changes.

Proposed tweak
 `@patch.dict`(os.environ, {"MINIMAX_API_KEY": "test-key"})
 `@patch`("urllib.request.urlopen")
 def test_generate_uses_default_voice_id(self, mock_urlopen):
     import asyncio
+    from backend.backends.minimax_backend import DEFAULT_VOICE_ID

     mock_resp = MagicMock()
     mock_resp.read.return_value = self._make_mock_response()
     mock_resp.__enter__ = lambda s: s
     mock_resp.__exit__ = MagicMock(return_value=False)
     mock_urlopen.return_value = mock_resp

     backend = self._make_backend()
     asyncio.get_event_loop().run_until_complete(
         backend.generate("test", {})
     )

     call_args = mock_urlopen.call_args
     request = call_args[0][0]
     payload = json.loads(request.data.decode("utf-8"))
-    self.assertEqual(payload["voice_setting"]["voice_id"], "English_Graceful_Lady")
+    self.assertEqual(payload["voice_setting"]["voice_id"], DEFAULT_VOICE_ID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_minimax_backend.py` around lines 189 - 206, The test
test_generate_uses_default_voice_id currently asserts a hard-coded string
"English_Graceful_Lady"; update it to import and use DEFAULT_VOICE_ID so the
assertion checks payload["voice_setting"]["voice_id"] == DEFAULT_VOICE_ID.
Locate the test function test_generate_uses_default_voice_id and replace the
hard-coded expected value with the module-level constant DEFAULT_VOICE_ID
(imported from the backend module that defines it) to keep the test resilient to
default changes while still verifying backend.generate uses the default when
none is provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@app/src/components/VoiceProfiles/ProfileForm.tsx`:
- Around line 842-845: When the preset engine selection changes, reset the
selected preset voice to avoid submitting a stale voice ID: update the
onValueChange handler for the engine Select (where selectedPresetEngine and
setSelectedPresetEngine are used) to also call setSelectedPresetVoiceId(null or
undefined) so selectedPresetVoiceId is cleared; do the same update for the other
engine Select instance (the second occurrence that currently mirrors lines ~853)
to ensure both engine changes clear the voice selection.

---

Nitpick comments:
In `@backend/tests/test_minimax_backend.py`:
- Around line 189-206: The test test_generate_uses_default_voice_id currently
asserts a hard-coded string "English_Graceful_Lady"; update it to import and use
DEFAULT_VOICE_ID so the assertion checks payload["voice_setting"]["voice_id"] ==
DEFAULT_VOICE_ID. Locate the test function test_generate_uses_default_voice_id
and replace the hard-coded expected value with the module-level constant
DEFAULT_VOICE_ID (imported from the backend module that defines it) to keep the
test resilient to default changes while still verifying backend.generate uses
the default when none is provided.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c0b9975-446e-49a7-8b99-6f83422bc9ff

📥 Commits

Reviewing files that changed from the base of the PR and between d70b878 and c11d38e.

📒 Files selected for processing (10)
  • app/src/components/Generation/EngineModelSelector.tsx
  • app/src/components/VoiceProfiles/ProfileForm.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/constants/languages.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/minimax_backend.py
  • backend/models.py
  • backend/routes/profiles.py
  • backend/tests/test_minimax_backend.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant