Skip to content

fix: add configurable models for tts and asr#50

Open
ShaojieLiu wants to merge 1 commit intoTHU-MAIC:mainfrom
ShaojieLiu:lsj/fix-configurable-tts-asr
Open

fix: add configurable models for tts and asr#50
ShaojieLiu wants to merge 1 commit intoTHU-MAIC:mainfrom
ShaojieLiu:lsj/fix-configurable-tts-asr

Conversation

@ShaojieLiu
Copy link

@ShaojieLiu ShaojieLiu commented Mar 17, 2026

Summary

This PR adds configurable model support for TTS and ASR.

Previously, TTS and ASR provider implementations used hardcoded server-side models, so the settings UI could not choose which model to use. This change introduces persisted ttsModelId and asrModelId settings, updates the TTS/ASR configuration pages to follow the image-generation model-management pattern, and propagates the selected model through preview, generation, and transcription flows.

Related Issues

Fixes the issue where TTS and ASR could not select models independently in settings.
closed #14

Changes

  • Added ttsModelId and asrModelId to persisted settings state
  • Added TTS/ASR provider model definitions to audio provider metadata
  • Reworked TTS and ASR settings pages to match the image-generation model section pattern
  • Added bottom-positioned model lists for TTS/ASR with selectable active model
  • Added create/edit/delete support for custom TTS/ASR models
  • Updated TTS preview, scene generation, generation preview, and ASR recording/transcription flows to send selected model IDs
  • Replaced hardcoded server-side TTS/ASR model selection with configurable model IDs from settings
  • Added migration/defaulting behavior so existing users get valid default TTS/ASR models

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • CI/CD or build changes

Verification

Steps to reproduce / test

  1. Open TTS settings and confirm the model section appears at the bottom of the page
  2. Select a built-in TTS model, add a custom TTS model, switch selection, and verify preview requests use the selected model
  3. Open ASR settings and confirm the model section appears at the bottom of the page
  4. Select a built-in ASR model, add a custom ASR model, switch selection, and verify transcription requests use the selected model
  5. Delete a selected custom TTS/ASR model and confirm the UI falls back to another available model

What you personally verified

  • Ran targeted eslint checks on modified files

  • Ran pnpm exec tsc -p tsconfig.json --noEmit

  • Verified the TTS/ASR settings UI structure now mirrors the image-generation model section

  • Verified selected TTS/ASR model IDs are threaded through client requests into server handlers

  • Did not run full manual browser interaction testing or full CI suite in this session

  • pnpm exec eslint lib/audio/types.ts lib/audio/constants.ts lib/store/settings.ts lib/audio/tts-providers.ts lib/audio/asr-providers.ts app/api/generate/tts/route.ts app/api/transcription/route.ts components/settings/tts-settings.tsx components/settings/asr-settings.tsx components/generation/media-popover.tsx components/audio/tts-config-popover.tsx lib/hooks/use-audio-recorder.ts lib/hooks/use-scene-generator.ts app/generation-preview/page.tsx

  • pnpm exec tsc -p tsconfig.json --noEmit

Evidence

  • CI passes (pnpm check && pnpm lint && npx tsc --noEmit)
  • Manually tested locally
  • Screenshots / recordings attached (if UI changes)

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have added/updated documentation as needed
  • My changes do not introduce new warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: 增加对 TTS 服务商 Model ID 的灵活配置支持

1 participant