Skip to content

A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

Notifications You must be signed in to change notification settings

xingfengli/EM2LDL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EM²LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

This study introduces EM²LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity, EM²LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM²LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication.

About the EM²LDL Corpus

The EM²LDL corpus contains a total of 3,998 audio utterances, amounting to 14,540.08 seconds of speech (approximately 4.04 hours). The average duration per utterance is 3.64 seconds, reflecting the concise yet emotionally expressive nature of the collected segments. The corpus captures intra-utterance code-switching across three language pairs: Cantonese-English (CE), Mandarin-English (ME), and Mandarin-Cantonese (MC).

Each utterance in the EM²LDL corpus is annotated with a probability distribution over 32 emotion categories derived from 20-rater annotations based on Plutchik’s Emotion Wheel. On average, each utterance is associated with 9.25 emotion labels (standard deviation: 1.65), with a maximum of 16 and a minimum of 4 labels, reflecting the complexity of mixed emotional states. MacauDB Overview

Citation

@misc{li2025em2ldlmultilingualspeechcorpus,

  title={EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning}, 
  
  author={Xingfeng Li and Xiaohan Shi and Junjie Li and Yongwei Li and Masashi Unoki and Tomoki Toda and Masato Akagi},
  
  year={2025},
  
  eprint={2511.20106},
  
  archivePrefix={arXiv},
  
  primaryClass={cs.CL},
  
  url={https://arxiv.org/abs/2511.20106}, 

}

Access to the EM²LDL Corpus

Please download the User License Agreement (LA.pdf), complete the agreement and return it to Dr. Xingfeng Li, xfli@cityu.edu.mo. Once the signed agreement is received and approved, you will receive instructions to download the database.

About

A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published