Skip to content

How to build the validation data? #62

@AShoydokova

Description

@AShoydokova

Hello,

thank you so much for the code and paper! I'm trying to train the model on speech command data. I've made the train and validation data sets through 2 scripts: make_spect_f0.py and make_metadat.py, but the model fails on the validation step, on this line :
x_identic_val = self.G(x_f0, x_real_pad, emb_org_val)

The error is:
RuntimeError: The expanded size of the tensor (192) must match the existing size (1085) at non-singleton dimension 1. Target sizes: [-1, 192, -1]. Tensor sizes: [1085, 1].

I'm not sure why there is a mismatch as self.G worked. Although there is the "G identity mapping loss" step which preprocess the input before feeding to self.G. Do I need to do the same with the validation data? Also 192 is the max_len_pad = 192, while 1085 is the number of the speakers (dim_spk_emb = 1085). Do I need to change the max_len_pad?

I'll appreciate for any help or direction!

My hparams.py is below

hparams = HParams(
    # model   
    freq = 8,
    dim_neck = 8,
    freq_2 = 8,
    dim_neck_2 = 1,
    freq_3 = 8,
    dim_neck_3 = 32,
    out_channels = 10 * 3,
    layers = 24,
    stacks = 4,
    residual_channels = 512,
    gate_channels = 512,  # split into 2 groups internally for gated activation
    skip_out_channels = 256,
    cin_channels = 80,
    gin_channels = -1,  # i.e., speaker embedding dim
    weight_normalization = True,
    n_speakers = -1,
    dropout = 1 - 0.95,
    kernel_size = 3,
    upsample_conditional_features = True,
    upsample_scales = [4, 4, 4, 4],
    freq_axis_kernel_size = 3,
    legacy = True,
    
    dim_enc = 512,
    dim_enc_2 = 128,
    dim_enc_3 = 256,
    
    dim_freq = 80,
    dim_spk_emb = 1085,
    dim_f0 = 257,
    dim_dec = 512,
    len_raw = 128,
    chs_grp = 16,
    
    # interp
    min_len_seg = 19,
    max_len_seg = 32,
    # min_len_seq = 64,
    min_len_seq = 0,
    # max_len_seq = 128,
    max_len_seq = 10,
    max_len_pad = 192,
    
    # data loader
    root_dir = 'assets/spmel',
    feat_dir = 'assets/raptf0',
    batch_size = 16,
    mode = 'train',
    shuffle = True,
    num_workers = 0,
    samplier = 8,

    # Convenient model builder
    builder = "wavenet",

    hop_size = 256,
    log_scale_min = float(-32.23619130191664),
    
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions