Fix speech editing boundary artifacts by working in mel domain #1242

acadarmeria · 2025-12-26T08:51:20Z

Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio.

This commit refactors the approach to work in mel domain:

Compute mel spectrogram on the clean original audio first
Insert zero frames in mel domain instead of zero samples in wav domain
Use frame-level granularity throughout for consistency

Benefits:

Eliminates boundary artifacts
More consistent behavior regardless of small float variations in input times
Cleaner edit boundaries

Changes to speech_edit.py (lines 148-220):

Convert audio to mel using model.mel_spec() before editing
Build mel_cond by concatenating original mel frames + zero frames
Calculate all time-based values at frame level first, then convert to samples
Pass mel_cond directly to model.sample() instead of raw audio

Previously, speech_edit.py worked in wav domain (inserting zeros into the waveform before computing mel spectrogram), which caused boundary artifacts when mel spectrogram windows straddled zeros and real audio. This commit refactors the approach to work in mel domain: - Compute mel spectrogram on the clean original audio first - Insert zero frames in mel domain instead of zero samples in wav domain - Use frame-level granularity throughout for consistency Benefits: - Eliminates boundary artifacts - More consistent behavior regardless of small float variations in input times - Cleaner edit boundaries Changes to speech_edit.py (lines 148-220): - Convert audio to mel using model.mel_spec() before editing - Build mel_cond by concatenating original mel frames + zero frames - Calculate all time-based values at frame level first, then convert to samples - Pass mel_cond directly to model.sample() instead of raw audio

SWivid merged commit 27e20fc into SWivid:main Dec 26, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix speech editing boundary artifacts by working in mel domain #1242

Fix speech editing boundary artifacts by working in mel domain #1242

Uh oh!

acadarmeria commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix speech editing boundary artifacts by working in mel domain #1242

Fix speech editing boundary artifacts by working in mel domain #1242

Uh oh!

Conversation

acadarmeria commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants