Skip to content

feat: add TTS Model ID configuration UI#24

Closed
YizukiAme wants to merge 3 commits intoTHU-MAIC:mainfrom
YizukiAme:feat/tts-model-id-config
Closed

feat: add TTS Model ID configuration UI#24
YizukiAme wants to merge 3 commits intoTHU-MAIC:mainfrom
YizukiAme:feat/tts-model-id-config

Conversation

@YizukiAme
Copy link

@YizukiAme YizukiAme commented Mar 16, 2026

Summary

Add a Model ID input field to the TTS provider settings dialog, allowing users to customize the model used for text-to-speech generation (e.g., switching from gpt-4o-mini-tts to tts-1-hd for OpenAI).

Previously, model IDs were hardcoded and users had no way to change them through the UI.

Closes #14

Changes

UI (components/settings/tts-settings.tsx)

  • Add Model ID input field with conditional rendering (only shown for providers that support model IDs: OpenAI, GLM, Qwen)
  • Import DEFAULT_TTS_MODELS constant for placeholder display and conditional logic
  • Default model shown as placeholder text (e.g., gpt-4o-mini-tts for OpenAI)

Constants & Types (lib/audio/)

  • Add DEFAULT_TTS_MODELS map in constants.ts with default model IDs per provider
  • Add modelId to TTSProviderConfig type definition
  • Update TTS providers to read modelId from config with fallback to defaults

API & Integration

  • Wire modelId through the TTS API route (app/api/generate/tts/route.ts)
  • Pass modelId from scene generator (lib/hooks/use-scene-generator.ts)

i18n (lib/i18n/settings.ts)

  • Add ttsModelId translation key for Chinese ("模型 ID") and English ("Model ID")

Test Plan

  • Select OpenAI TTS → Model ID field visible with gpt-4o-mini-tts placeholder
  • Select GLM TTS → Model ID field visible with emoti-voice placeholder
  • Select Azure TTS → Model ID field hidden (Azure doesn't use model IDs)
  • Select Browser Native TTS → Model ID field hidden
  • Custom model ID persists after page reload
  • Empty model ID falls back to default

Add a Model ID input field to the TTS provider settings dialog,
allowing users to customize the model used for text-to-speech generation.

Changes:
- Add Model ID input to tts-settings.tsx with conditional rendering
  (only shown for providers that support model IDs: OpenAI, GLM, Qwen)
- Import DEFAULT_TTS_MODELS constant for placeholder and conditional logic
- Add ttsModelId i18n keys for Chinese and English locales
- Add modelId field support in audio constants (DEFAULT_TTS_MODELS map)
- Add modelId to TTSProviderConfig type and settings store
- Wire modelId through TTS API route and provider implementations

The field shows the default model as placeholder text and persists
user-specified model IDs to the settings store.
@YizukiAme YizukiAme force-pushed the feat/tts-model-id-config branch from f5efce6 to 8e2f2c3 Compare March 17, 2026 10:15
@YizukiAme YizukiAme closed this Mar 17, 2026
@YizukiAme
Copy link
Author

Closing in favor of #50 which provides a more comprehensive solution covering both TTS and ASR model configuration, with a UI pattern consistent with image-generation model management.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: 增加对 TTS 服务商 Model ID 的灵活配置支持

2 participants