BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change (ICLR2026)
by Manuela González-González3,4, Soufiane Belharbi1, Muhammad Osama Zeeshan1, Masoumeh Sharafi1, Muhammad Haseeb Aslam1, Alessandro Lameiras Koerich2, Marco Pedersoli1, Simon L. Bacon3,4, Eric Granger1
1 LIVIA, Dept. of Systems Engineering, ETS Montreal, Canada
2 LIVIA, Dept. of Software and IT Engineering, ETS Montreal, Canada
3 Dept. of Health, Kinesiology, & Applied Physiology, Concordia University, Montreal, Canada
4 Montreal Behavioural Medicine Centre, CIUSSS Nord-de-l’Ile-de-Montréal, Canada
Ambivalence and hesitancy (A/H), a closely related construct, is the primary
reasons why individuals delay, avoid, or abandon health behaviour changes.
It is a subtle and conflicting emotion that sets a person in a state between
positive and negative orientations, or between acceptance and refusal to do
something. It manifests by a discord in affect between multiple modalities or
within a modality, such as facial and vocal expressions, and body language.
Although experts can be trained to recognize A/H as done for in-person
interactions, integrating them into digital health interventions is costly and
less effective. Automatic A/H recognition is therefore critical for the
personalization and cost-effectiveness of digital behaviour change interventions.
However, no datasets currently exists for the design of machine learning models
to recognize A/H.
This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset
collected for multimodal recognition of A/H in videos.
It contains 1,427 videos with a total duration of 10.60 hours captured from 300
participants across Canada answering predefined questions to elicit A/H. It is
intended to mirror real-world online personalized behaviour change interventions.
BAH is annotated by three experts to provide timestamps that
indicate where A/H occurs, and frame- and video-level annotations with A/H cues.
Video transcripts, cropped and aligned faces, and participants' meta-data are
also provided. Since A and H manifest similarly in practice, we provide a binary
annotation indicating the presence or absence of A/H.
Additionally, this paper includes benchmarking results using baseline models on
BAH for frame- and video-level recognition, zero-shot prediction, and
personalization using source-free domain adaptation. The limited performance
highlights the need for adapted multimodal and spatio-temporal models for A/H
recognition. Results for specialized methods for fusion are shown to assess the
presence of conflict between modalities, and for temporal modelling for
within-modality conflict are essential for better A/H recognition.
The data, code, and pretrained weights are publicly available.
Code: Pytorch 2.2.2
@inproceedings{gonzalez-26-bah,
title={{BAH} Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change},
author={González-González, M. and Belharbi, S. and Zeeshan, M. O. and
Sharafi, M. and Aslam, M. H and Pedersoli, M. and Koerich, A. L. and
Bacon, S. L. and Granger, E.},
booktitle={ICLR},
year={2026}
}
- BAH dataset: Download
- BAH dataset: Splits
- Installation of the environments
- Supported modalities
- Pre-processing
- Run code
- Pretrained weights (evaluation)
- BAH presentation
To download BAH dataset, please fill in the following form which includes signing and uploading the End-User License Agreement (EULA). You will receive a link to download BAH dataset. Please read the following instructions before requesting the dataset. It will help you avoid errors/delays.
- PLEASE FILL IN THE DATASET REQUEST FORM CAREFULLY TO AVOID ERRORS/DELAYS.
- PLEASE FOLLOW THE NEXT INSTRUCTIONS.- Who can request BAH dataset? only full-time faculty position (for example: Assistant Professor, Associate Professor, or Professor) at a university, higher education institution, or equivalent organisation can request BAH dataset (fill in the form and sign the EULA). The applicant cannot be a student (UG/PG/Ph.D./Postdoc).
-
All form fields with "* must provide value" are mandatory.
-
"Email" field: please enter your permanent faculty email address.
- Please list the names of everyone who will have access to the dataset, such as your students.
- For the certification:
I certify that neither myself nor any of my research team are directly associated with an institution or organisation on Canada's Named Research Organisations List (see https://science.gc.ca/site/science/en/safeguarding-your-research/guidelines-and-tools-implement-research-security/sensitive-technology-research-and-affiliations-concern/named-research-organizations).
- PLEASE ENSURE THAT THE AFFILIATION OF THE MAIN APPLICANT AND ANYONE HAVING
- ACCESS TO THE DATASET DOES NOT SHOW ON THE LIST OF ORGANIZATIONS LISTED IN THE LINK.
- IF YOU CHOOSE `NO` AS ANSWER, UNFORTUNATELY, THE DATASET CAN NOT BE PROVIDED DUE
- TO LEGAL AND ETHICAL REASONS OUT OF OUR CONTROL.Search affiliations/institutions/organisations in the provided link https://science.gc.ca/site/science/en/safeguarding-your-research/guidelines-and-tools-implement-research-security/sensitive-technology-research-and-affiliations-concern/named-research-organizations:
- For the question
What is the primary purpose of your request for access to the dataset?:
- PLEASE CHOOSE THE OPTION: "I am requesting access for other academic research purposes
- (e.g., thesis, lab project, independent study)."
- CURRENTLY, WE DO NOT PROVIDE BAH DATASET FOR ANY CHALLENGE.
- THE CURRENT PROVIDED BAH DATASET VERSION CAN NOT BE USED FOR CHALLENGES.- For the section
Intended Use of the Dataset:
- PLEASE PROVIDE SUFFICIENT SUMMARY OF THE INTENDED USES OF THIS DATASET.
- PLEASE AVOID 1-2 SENTENCES.- For the section
Possible products:
- PLEASE PROVIDE SUFFICIENT SUMMARY OF THE POSSIBLE PRODUCTS INTENDED FROM THIS DATASET.
- PLEASE AVOID 1-2 SENTENCES.-
If you have questions while filling in the request form, please contact us.
-
BAH DATASET REQUEST FORM: https://www.crhscm.ca/redcap/surveys/?s=LDMDDJR3AT9P37JY
We provide BAH dataset splits for both scenarios:
- Supervised learning: at video and frame level are located at dataset-splits.
- Domain adaptation: coming up soon.
# Face cropping and alignment virtual env.
./create_v_env_face_extract.sh
# Pre-processing and training virtual env.
./create_v_env_main.sh- Vision:
vision - Audio:
vggish - Text:
bert
Read ./abaw5_pre_processing/README.txt
and download the required file and unzip it. Adjust get_root_wsol_dataset() in ./abaw5_pre_processing/dlib/tools.py
and in ./default_config.py to point to the absolute path
of the folder containing the datasets folders, e.g.: /a/b/c/d/datasets. Inside,
there should be the needed datasets folders, e.g.: BAH_DB. Download pretrained weights vggish.pth and
res50_ir_0.887.pth from here
into the folder ./pretrained_models.
- Face cropping and alignment: Here is an example of processing
BAH_DB, which is divided data into 8 blocks, and we process block 0.
#!/usr/bin/env bash
source ~/venvs/bah-main-face-extract/bin/activate
cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid
python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split train --nblocks 8 --process_block 0
python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split val --nblocks 8 --process_block 0
python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split test --nblocks 8 --process_block 0- Feature extraction:
#!/usr/bin/env bash
source ~/venvs/bah-main/bin/activate
cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid
python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split val --nparts 1 --part 0
python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split test --nparts 1 --part 0
python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split train --nparts 1 --part 0
# ==============================================================================Since feature extraction is done by block, we need to gather all the blocks
for some results files processing_records*, and dataset_info*. These 2
files need to hold some information for all data. Run:
python post_feature_extract.pyBefore running this, change the name of the dataset in post_feature_extract.py.
- Compact of face images: Cropped faces need to be compacted into a single file, similarly to other modalities. Example:
#!/usr/bin/env bash
source ~/venvs/bah-main-face-extract/bin/activate
cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid
python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split train --nparts 1 --part 0
python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split val --nparts 1 --part 0
python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split test --nparts 1 --part 0
- Override frame label:
set_frame_labels.pywithoverload_real_frame_labels()
#!/usr/bin/env bash
source ~/venvs/bah-main/bin/activate
# ==============================================================================
cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid
python main.py \
--train_supervision_type video_fr_sup \
--dataset_name BAH_DB \
--use_other_class False \
--train_p 100.0 \
--valid_p 100.0 \
--test_p 100.0 \
--amp True \
--seed 0 \
--mode TRAINING \
--resume False \
--modality video+vggish+bert+EXPR_continuous_label \
--calc_mean_std True \
--emotion LATER \
--model_name JMT \
--num_folds 1 \
--fold_to_run 0 \
--use_pretrained_w False \
--visual_backbone_path None \
--num_heads 2 \
--modal_dim 32 \
--tcn_kernel_size 5 \
--num_epochs 60 \
--min_num_epochs 3 \
--early_stopping 50 \
--window_length 696 \
--hop_length 48 \
--train_batch_size 4 \
--eval_batch_size 1 \
--num_workers 12 \
--opt__weight_decay 0.0001 \
--opt__name_optimizer SGD \
--opt__lr 0.008 \
--opt__momentum 0.9 \
--opt__dampening 0.0 \
--opt__nesterov True \
--opt__beta1 0.9 \
--opt__beta2 0.999 \
--opt__eps_adam 1e-08 \
--opt__amsgrad False \
--opt__lr_scheduler True \
--opt__name_lr_scheduler MYSTEP \
--opt__gamma 0.9 \
--opt__step_size 50 \
--opt__last_epoch -1 \
--opt__min_lr 1e-07 \
--opt__t_max 100 \
--opt__mode MIN \
--opt__factor 0.5 \
--opt__patience 10 \
--opt__gradual_release 1 \
--opt__release_count 3 \
--opt__milestone 0 \
--opt__load_best_at_each_epoch False \
--exp_id 05_14_2025_14_18_15_411877__5413229This work introduces a new and unique multimodal and subject-based video dataset, BAH, for A/H recognition in videos. BAH contains 300 participants across 9 provinces in Canada. Recruited participants answer 7 designed questions to elicit A/H while recording themselves via webcam and microphone via our web-platform. The dataset amounts to 1,427 videos for a total duration of 10.60 hours with 1.79 hours of A/H. It was annotated by our behavioural team at video- and frame-level.
Our initial benchmarking yielded limited performance highlighting the difficulty of A/H recognition. Our results showed also that leveraging context, multimodality, and adapted feature fusion is a first good direction to design robust models. Our dataset and code are made public.
The following appendix contains related work, more detailed and relevant statistics about the datasets and its diversity, dataset limitations, implementation details, and additional results.
This work was supported in part by the Fonds de recherche du Québec – Santé, the Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, and the Digital Research Alliance of Canada. We thank interns that participated in the dataset annotation: Jessica Almeida (Concordia University, Université du Québec à Montréal), and Laura Lucia Ortiz (MBMC).
This code is heavily based on github.com/sucv/ABAW3.


































