EM²LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning
This study introduces EM²LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity, EM²LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM²LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication.
The EM²LDL corpus contains a total of 3,998 audio utterances, amounting to 14,540.08 seconds of speech (approximately 4.04 hours). The average duration per utterance is 3.64 seconds, reflecting the concise yet emotionally expressive nature of the collected segments. The corpus captures intra-utterance code-switching across three language pairs: Cantonese-English (CE), Mandarin-English (ME), and Mandarin-Cantonese (MC).
Each utterance in the EM²LDL corpus is annotated with a probability distribution over 32 emotion categories derived from 20-rater annotations based on Plutchik’s Emotion Wheel. On average, each utterance is associated with 9.25 emotion labels (standard deviation: 1.65), with a maximum of 16 and a minimum of 4 labels, reflecting the complexity of mixed emotional states.

@misc{li2025em2ldlmultilingualspeechcorpus,
title={EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning},
author={Xingfeng Li and Xiaohan Shi and Junjie Li and Yongwei Li and Masashi Unoki and Tomoki Toda and Masato Akagi},
year={2025},
eprint={2511.20106},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.20106},
}
Please download the User License Agreement (LA.pdf), complete the agreement and return it to Dr. Xingfeng Li, xfli@cityu.edu.mo. Once the signed agreement is received and approved, you will receive instructions to download the database.