Voice cloning using Qwen3-TTS-12Hz-1.7B-Base with combined reference samples.
uv init --python 3.12
uv add qwen-tts soundfile torch numpyuv run python clone_voice.pyGenerates speech using 2 combined reference samples for improved voice quality.
Performance: ~111s for 3 outputs on MPS (Apple Silicon)
Reference files follow the pattern: samples/ref_N (without extension)
ref_1.wav+ref_1.txtref_2.wav+ref_2.txt
Add or remove references by editing ref_paths list in clone_voice.py. References are concatenated with 1s silence gaps.
Generated audio samples:
I'm not the pheasant plucker, I'm the pheasant plucker's mate. I'm only plucking pheasants 'cause the pheasant plucker's running late
output_0.mp4
Extremely accurate and stunningly beautiful bespoke printer profiles transform creative print making to an extraordinary extent.
output_1.mp4
Generating code from AI prompts can lead to verbose code, or duplication of existing code instead of using an abstraction. But there are times when this is perfectly acceptable, such as when building proof of concepts, or when topics like program efficiency are unimportant.
output_2.mp4
For fast realtime generation without re-processing reference audio:
# 1. Create voice model once
uv run python create_voice_prompt.py
# 2. Use for realtime generation
uv run python realtime_tts.pyLaunch the Gradio web interface:
uv run python demo_app.pyThen open http://localhost:8000 in your browser.
See docs/realtime.md for details.
See docs/configuration.md for generation parameters.
Edit syn_texts in clone_voice.py to customize synthesis text.
Location: ~/LLMs/Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-Base
Device: Auto-detects MPS or CPU