Conversation
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
…de wav code Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
Signed-off-by: weedge <weege007@gmail.com>
This was referenced Feb 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat:
support tts mode:
text+ref audio waveform -> tokenizer -> text+audio token ids -> step1 lm -> audio token ids (wav_code) -> flow(CFM) -> mel - vocoder(HiFT) -> waveform
src+ref audio waveform -> speech tokenizer-> audio token ids (wav_code) -> flow(CFM) -> mel - vocoder(HiFT) -> clone ref audio waveform
python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_get_voices REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \ REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \ python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_set_voice python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize_speak # ref audio TTS_STREAM_FACTOR=4 \ REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \ REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \ TTS_TEXT="万物之始,大道至简,衍化至繁。君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。" \ python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize TTS_STREAM_FACTOR=4 \ REF_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \ REF_TEXT="欢迎大家来体验达摩院推出的语音识别模型" \ TTS_TEXT="万物之始,大道至简,衍化至繁。君不见黄河之水天上来,奔流到海不复回。君不见高堂明镜悲白发,朝如青丝暮成雪。人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。" \ python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesize_speak # ---- TTS_MODE: voice_clone ---- # src audio + default ref audio SRC_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \ python -m unittest test.modules.speech.tts.test_step.TestStepTTS.test_synthesizecolab 笔记:
step-audio TTS from step-audio (Speech Decoder)
step1 LM 3B + flow (code from CosyVoice)+ HiFT(code from CosyVoice)
speech tokenizer
a dual codebook speech tokenizer framework. like ARCON (from stepfun team);
linguistic tokenizer use FunASR Paraformer(NAR) model;
semantic tokenizer use CosyVoice speech tokenizer(from SenseVoice)
step1 LM 3B from step-audio 130B distillation
flow (CFM)
see:
HiFT vocoder
see: