-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
in the meldataset.py, could see that all wav files are resampled to 24000sps. however, as the MelSpectrogram() transform is called without sample_rate argument defaults to 16000sps.
to_mel = torchaudio.transforms.MelSpectrogram(
n_mels=80, n_fft=2048, win_length=1200, hop_length=300)
mean, std = -4, 4
def preprocess(wave):
wave_tensor = torch.from_numpy(wave).float()
mel_tensor = to_mel(wave_tensor)
mel_tensor = (torch.log(1e-5 + mel_tensor.unsqueeze(0)) - mean) / std
return mel_tensor
questions :
- believe 2400sps (vs 16000sps needed) was an oversight ?
- also, how was the mean/std of -4, 4 arrived.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels