Source code to load, process, and extract features, as well as to perform statistical analyses and machine learning, for the MW_OFFICE dataset
For the Measuring and Quantifying Mental Workload and Stress in Everyday Situations, focusing on typical office activities at Zenodo (https://doi.org/10.5281/zenodo.15681262)
The dataset (approximately 80 hours in total) consists of physiological signals from wearable electroencephalography (EEG), electrodermal activity (EDA), photoplethysmogram (PPG), acceleration, and temperature sensors. The dataset was recorded from 10 participants who performed broadly pre-defined relaxation, reading, summarizing, and mental workload tasks. The consumer-grade physiological signals were obtained from the Muse S EEG headband and Empatica E4 wristband. The data is balanced across controlled and uncontrolled environments. During the study, participants worked on Stroop, N-Back, reading, summarizing, and relaxation tasks in the controlled environment (roughly half of the data) and realistic home-office tasks such as reading, summarizing, and relaxing in uncontrolled environments. Data labels were obtained using Likert scales and NASA-TLX questionnaires. The completely anonymized data set is publicly available and opens a vast potential to the research community working on mental workload detection using consumer-grade wearable sensors. Among others, the data is suitable for developing real-time cognitive load detection methods, research on signal processing techniques for challenging environments, developing artifact removal techniques from low-cost wearable devices' data, or developing personal mental workload assistants for scenarios such as scheduling just-in-time work-break recommendations.
As literature for the reading and summarizing tasks, six scientific publications (Anagnos and Kiremidjian (1998); Nunes-Halldorson and Duran (2003); Mansouri et al. (2011); Kwak et al. (2018); Zhao et al. (2018); Fernbach et al. (2019)) were chosen as difficult texts, and six short-stories from famous English writers (Edgar Allan Poe (’The Gift of the Magi’, ’The Masque of the Red Death’, ’The Cask of Amontillado’, and ’The Black Cat’), Oscar Wilde (’The Devoted Friend’), and Charlotte Brontë (’The Search After Happiness’)) were chosen as easy texts.
The link to the publication will be added once the manuscript is accepted in the respective journal.
The anonymized data is located in the subfolder 'dataset', in which the subfolders 'Participant 01' to 'Participant 10' hold data from individual participants. For each participant, three subfolders exist: 'Lab 1' and 'Lab 2' for the data recorded in the controlled environment, and 'In-the-wild' for the data recorded in uncontrolled environments. In the 'In-the-wild'-subfolder, numerated folders exist which contain the data for the respective recording, and a file called 'P#participant_wild_labels.csv' (e.g., 'P01_wild_labels.csv') contains the respective labels. Per recording (i.e., under 'Lab 1' and 'Lab 2' as well as '1' ... 'N' for the 'In-the-wild'-subfolder), the following files exist for the data recorded from the Empatica E4 ('ACC.csv', 'BVP.csv', 'EDA.csv', 'HR.csv', 'IBI.csv', 'info.txt', 'tags.csv', 'TEMP.csv') and for the data recorded from the Muse S ('P#participant_#recording_muse.csv', e.g. 'P2_wild1_muse.csv'). For the data recorded in the controlled environment (i.e., in 'Lab 1' and 'Lab 2'), two more files exist: '*papers*date.csv', and 'psychopy_log.log', each holding the experimental data recorded during the computerized mental workload tasks.
Apart from the anonymized data, the source code to load, process, and extract features, as well as to perform statistical analyses and machine learning can be found in the Python script 'towards_general_cognitive_load_assistants_ML.py'. The Python script 'psychopy_csv_log_parser.py' is a helper script to analyze the log files generated during the recordings in the controlled environment. The Python script 'towards_general_cognitive_load_assistants_ML.py' is parameterized, and all the experiments reported in the paper can be reproduced using the Bash script 'Towards_General_Cognitive_Load_Assistants_ML.sh' provided. To run the source code, it is recommended to set up a virtual environment with the required libraries. The anaconda .yml-file 'anaconda_environment.yml' contains the required information about Python libraries to setup the anaconda environment 'neuroinf', which is activated automatically in the Bash script 'Towards_General_Cognitive_Load_Assistants_ML.sh' provided. Furthermore, in case you choose to reproduce and replicate the results using the source code provided, the empty folders 'ml_results' and 'stats_results' exist in which the respective results would automatically be stored.
Finally, please feel free to reach out should you encounter any issues or have any open questions regarding this dataset, the source code, or the publication. You can reach the authors via the contact information provided in the publication or via email to 'christoph.anders@hpi.de' or 'christoph.anders@hpi.uni-potsdam.de' or 'office-arnrich@hpi.uni-potsdam.de' or 'mw_office_2025@hpi.de'.