Skip to content

Questions about training process #8

@isjwdu

Description

@isjwdu

Hi,

Thank you for your excellent work!

I am currently trying to reproduce ESC-base-no-adv but have encountered some issues.

  1. According to the README, training a base ESC model on 4×RTX 4090 GPUs takes approximately 12 hours for 250k steps using 3-second speech clips with a batch size of 36. Based on the provided config, the batch size per GPU should be 9, totaling 36 across 4 GPUs.
  • 1.a. I am using 2×A6000 (48GB) to train ESC-base-no-adv. To match the total batch size, I set 18 bs/GPU × 2 GPUs = 9 bs/GPU × 4 GPUs. However, the training speed appears significantly slower (~11x):
<<<<Experimental Setup: esc-base-non-adv>>>>
   BatchSize_per_Device: Train 18 Test 4    LearningRate: 0.0001
   Total_Training_Steps: 5000*50=250000
   Pre-Training_Steps: 5000*15=75000
   Optimizer: AdamW    Scheduler: constant
   Quantization_Dropout: 0.75
   Model #Parameters: 8.74M
TQDM: 23:29<129:24:34

I know the performance of A6000 is different with 4090, but the training speed will not be lost as so much (I guess?).

  • 1.b. I noticed that the train_data_path in config differs from the dns_training dataset you provided (it matches the one for ESC-large instead). Did you use a different dataset for ESC-base-no-adv?
  1. I anticipate that I will try to do some research on your ESC repo and may have some follow-up questions. It would be even better if you would be willing to email me your personal contact information (e.g. WeChat). My email is isjiawei.du@gmail.com.

Thank for your work again and I look forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions