Thank you for your research. I am trying to reproduce your experiment, but I have some confusion. Did you delete the questioner's voice in the data preprocessing part? Did you cut a whole speech into multiple short speeches? If you performed the speech For cutting, can you provide the relevant code?