TTS Adapter follows the capability adapter pattern: one repo per capability (TTS), with multiple engine implementations inside.
tts-adapter/
├── tts_adapter/
│ ├── engine.py # TTSEngine protocol
│ ├── contract.py # Request/response models
│ ├── config.py # Environment config
│ ├── engines/
│ │ └── qwen3.py # Qwen3-TTS implementation
│ └── api/
│ └── routes.py # FastAPI endpoints
- SOLID: Single responsibility (TTS), open for extension (add engines)
- Shared contract: All engines use the same request/response models
- Simple deployment: One container/process per capability
Split into separate repos only if:
- Dependency conflicts (different CUDA/torch versions)
- Different runtime requirements (GPU vs CPU)
- Separate scaling needs
All engines implement TTSEngine protocol:
class TTSEngine(Protocol):
def warmup(self) -> None: ...
def synthesize(text, language, speaker, instruct) -> bytes: ...
def synthesize_batch(texts, language, speaker, instruct) -> list[bytes]: ...This allows swapping engines without changing API code.
12-factor style via environment variables:
Global (all engines):
TTS_ENGINE- Engine name (e.g.,qwen3)TTS_DEFAULT_SPEAKER- Default speakerTTS_DEFAULT_LANGUAGE- Default language
Engine-specific (namespaced):
TTS_QWEN3_MODEL_ID- Qwen3 model identifierTTS_QWEN3_DEVICE- CUDA deviceTTS_QWEN3_DTYPE- Data type
Each engine owns its defaults. Global config is engine-agnostic (SOLID).
GPU inference requires serialization. Each engine uses a threading.Lock to prevent concurrent GPU access:
with self._lock:
wavs, sr = self._model.generate_custom_voice(...)Single-worker uvicorn (--workers 1) recommended.
- Create
tts_adapter/engines/new_engine.py - Implement
TTSEngineprotocol - Register in
engines/__init__.py(_ENGINESdict) - Set
TTS_ENGINE=new_engineto use it