EM²LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

This study introduces EM²LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity, EM²LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM²LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication.

About the EM²LDL Corpus

The EM²LDL corpus contains a total of 3,998 audio utterances, amounting to 14,540.08 seconds of speech (approximately 4.04 hours). The average duration per utterance is 3.64 seconds, reflecting the concise yet emotionally expressive nature of the collected segments. The corpus captures intra-utterance code-switching across three language pairs: Cantonese-English (CE), Mandarin-English (ME), and Mandarin-Cantonese (MC).

Each utterance in the EM²LDL corpus is annotated with a probability distribution over 32 emotion categories derived from 20-rater annotations based on Plutchik’s Emotion Wheel. On average, each utterance is associated with 9.25 emotion labels (standard deviation: 1.65), with a maximum of 16 and a minimum of 4 labels, reflecting the complexity of mixed emotional states.

Citation

@misc{li2025em2ldlmultilingualspeechcorpus,

  title={EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning}, 
  
  author={Xingfeng Li and Xiaohan Shi and Junjie Li and Yongwei Li and Masashi Unoki and Tomoki Toda and Masato Akagi},
  
  year={2025},
  
  eprint={2511.20106},
  
  archivePrefix={arXiv},
  
  primaryClass={cs.CL},
  
  url={https://arxiv.org/abs/2511.20106},

}

Access to the EM²LDL Corpus

Please download the User License Agreement (LA.pdf), complete the agreement and return it to Dr. Xingfeng Li, xfli@cityu.edu.mo. Once the signed agreement is received and approved, you will receive instructions to download the database.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
audio_demos		audio_demos
figs		figs
labels		labels
models		models
LA.pdf		LA.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EM²LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

About the EM²LDL Corpus

Citation

Access to the EM²LDL Corpus

About

Uh oh!

Releases

Packages

xingfengli/EM2LDL

Folders and files

Latest commit

History

Repository files navigation

EM²LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

About the EM²LDL Corpus

Citation

Access to the EM²LDL Corpus

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages