Thanks for releasing this dataset(https://huggingface.co/datasets/zhifeixie/Audio-Reasoner-CoTA). I encountered some issues while using it:
- QA mismatch in complex_audio subset:
There appear to be noticeable mismatches between the questions and answers in the complex_audio subset. In several cases, the provided answers do not seem to correspond accurately to the audio content or the questions themselves.
- Missing source dataset identifiers:
Would it be possible to provide the original source dataset identifiers (e.g., YouTube IDs, file names, or other unique references) for all samples in the dataset? This would greatly help with data organization for users.
- Corrupted archive in audiocap subset:
The compressed file for the audiocap subset appears to be corrupted or broken and I’m unable to extract its contents. Could you please check and re-upload the file?
Thanks again for your work, and looking forward to your response!
Thanks for releasing this dataset(https://huggingface.co/datasets/zhifeixie/Audio-Reasoner-CoTA). I encountered some issues while using it:
There appear to be noticeable mismatches between the questions and answers in the complex_audio subset. In several cases, the provided answers do not seem to correspond accurately to the audio content or the questions themselves.
Would it be possible to provide the original source dataset identifiers (e.g., YouTube IDs, file names, or other unique references) for all samples in the dataset? This would greatly help with data organization for users.
The compressed file for the audiocap subset appears to be corrupted or broken and I’m unable to extract its contents. Could you please check and re-upload the file?
Thanks again for your work, and looking forward to your response!