-
Notifications
You must be signed in to change notification settings - Fork 96
How to build the validation data? #62
Description
Hello,
thank you so much for the code and paper! I'm trying to train the model on speech command data. I've made the train and validation data sets through 2 scripts: make_spect_f0.py and make_metadat.py, but the model fails on the validation step, on this line :
x_identic_val = self.G(x_f0, x_real_pad, emb_org_val)
The error is:
RuntimeError: The expanded size of the tensor (192) must match the existing size (1085) at non-singleton dimension 1. Target sizes: [-1, 192, -1]. Tensor sizes: [1085, 1].
I'm not sure why there is a mismatch as self.G worked. Although there is the "G identity mapping loss" step which preprocess the input before feeding to self.G. Do I need to do the same with the validation data? Also 192 is the max_len_pad = 192, while 1085 is the number of the speakers (dim_spk_emb = 1085). Do I need to change the max_len_pad?
I'll appreciate for any help or direction!
My hparams.py is below
hparams = HParams(
# model
freq = 8,
dim_neck = 8,
freq_2 = 8,
dim_neck_2 = 1,
freq_3 = 8,
dim_neck_3 = 32,
out_channels = 10 * 3,
layers = 24,
stacks = 4,
residual_channels = 512,
gate_channels = 512, # split into 2 groups internally for gated activation
skip_out_channels = 256,
cin_channels = 80,
gin_channels = -1, # i.e., speaker embedding dim
weight_normalization = True,
n_speakers = -1,
dropout = 1 - 0.95,
kernel_size = 3,
upsample_conditional_features = True,
upsample_scales = [4, 4, 4, 4],
freq_axis_kernel_size = 3,
legacy = True,
dim_enc = 512,
dim_enc_2 = 128,
dim_enc_3 = 256,
dim_freq = 80,
dim_spk_emb = 1085,
dim_f0 = 257,
dim_dec = 512,
len_raw = 128,
chs_grp = 16,
# interp
min_len_seg = 19,
max_len_seg = 32,
# min_len_seq = 64,
min_len_seq = 0,
# max_len_seq = 128,
max_len_seq = 10,
max_len_pad = 192,
# data loader
root_dir = 'assets/spmel',
feat_dir = 'assets/raptf0',
batch_size = 16,
mode = 'train',
shuffle = True,
num_workers = 0,
samplier = 8,
# Convenient model builder
builder = "wavenet",
hop_size = 256,
log_scale_min = float(-32.23619130191664),
)