BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change (ICLR2026)

by Manuela González-González^3,4, Soufiane Belharbi¹, Muhammad Osama Zeeshan¹, Masoumeh Sharafi¹, Muhammad Haseeb Aslam¹, Alessandro Lameiras Koerich², Marco Pedersoli¹, Simon L. Bacon^3,4, Eric Granger¹

¹ LIVIA, Dept. of Systems Engineering, ETS Montreal, Canada
² LIVIA, Dept. of Software and IT Engineering, ETS Montreal, Canada
³ Dept. of Health, Kinesiology, & Applied Physiology, Concordia University, Montreal, Canada
⁴ Montreal Behavioural Medicine Centre, CIUSSS Nord-de-l’Ile-de-Montréal, Canada

Contact:

Abstract

Ambivalence and hesitancy (A/H), a closely related construct, is the primary reasons why individuals delay, avoid, or abandon health behaviour changes. It is a subtle and conflicting emotion that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. It manifests by a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions.
However, no datasets currently exists for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos.
It contains 1,427 videos with a total duration of 10.60 hours captured from 300 participants across Canada answering predefined questions to elicit A/H. It is intended to mirror real-world online personalized behaviour change interventions. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participants' meta-data are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization using source-free domain adaptation. The limited performance highlights the need for adapted multimodal and spatio-temporal models for A/H recognition. Results for specialized methods for fusion are shown to assess the presence of conflict between modalities, and for temporal modelling for within-modality conflict are essential for better A/H recognition. The data, code, and pretrained weights are publicly available.

Code: Pytorch 2.2.2

Citation:

@inproceedings{gonzalez-26-bah,
  title={{BAH} Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change},
  author={González-González, M. and Belharbi, S. and Zeeshan, M. O. and
    Sharafi, M. and Aslam, M. H and Pedersoli, M. and Koerich, A. L. and
    Bacon, S. L. and Granger, E.},
  booktitle={ICLR},
  year={2026}
}

BAH dataset: Download

To download BAH dataset, please fill in the following form which includes signing and uploading the End-User License Agreement (EULA). You will receive a link to download BAH dataset. Please read the following instructions before requesting the dataset. It will help you avoid errors/delays.

- PLEASE FILL IN THE DATASET REQUEST FORM CAREFULLY TO AVOID ERRORS/DELAYS.
- PLEASE FOLLOW THE NEXT INSTRUCTIONS.

Who can request BAH dataset? only full-time faculty position (for example: Assistant Professor, Associate Professor, or Professor) at a university, higher education institution, or equivalent organisation can request BAH dataset (fill in the form and sign the EULA). The applicant cannot be a student (UG/PG/Ph.D./Postdoc).

All form fields with "* must provide value" are mandatory.
"Email" field: please enter your permanent faculty email address.

Please list the names of everyone who will have access to the dataset, such as your students.

For the certification: I certify that neither myself nor any of my research team are directly associated with an institution or organisation on Canada's Named Research Organisations List (see https://science.gc.ca/site/science/en/safeguarding-your-research/guidelines-and-tools-implement-research-security/sensitive-technology-research-and-affiliations-concern/named-research-organizations).

- PLEASE ENSURE THAT THE AFFILIATION OF THE MAIN APPLICANT AND ANYONE HAVING
- ACCESS TO THE DATASET DOES NOT SHOW ON THE LIST OF ORGANIZATIONS LISTED IN THE LINK.
- IF YOU CHOOSE `NO` AS ANSWER, UNFORTUNATELY, THE DATASET CAN NOT BE PROVIDED DUE
- TO LEGAL AND ETHICAL REASONS OUT OF OUR CONTROL.

Search affiliations/institutions/organisations in the provided link https://science.gc.ca/site/science/en/safeguarding-your-research/guidelines-and-tools-implement-research-security/sensitive-technology-research-and-affiliations-concern/named-research-organizations:

For the question What is the primary purpose of your request for access to the dataset?:

- PLEASE CHOOSE THE OPTION: "I am requesting access for other academic research purposes
- (e.g., thesis, lab project, independent study)."
- CURRENTLY, WE DO NOT PROVIDE BAH DATASET FOR ANY CHALLENGE.
- THE CURRENT PROVIDED BAH DATASET VERSION CAN NOT BE USED FOR CHALLENGES.

For the section Intended Use of the Dataset:

- PLEASE PROVIDE SUFFICIENT SUMMARY OF THE INTENDED USES OF THIS DATASET.
- PLEASE AVOID 1-2 SENTENCES.

For the section Possible products:

- PLEASE PROVIDE SUFFICIENT SUMMARY OF THE POSSIBLE PRODUCTS INTENDED FROM THIS DATASET.
- PLEASE AVOID 1-2 SENTENCES.

If you have questions while filling in the request form, please contact us.
BAH DATASET REQUEST FORM: https://www.crhscm.ca/redcap/surveys/?s=LDMDDJR3AT9P37JY

BAH dataset: Splits

We provide BAH dataset splits for both scenarios:

Supervised learning: at video and frame level are located at dataset-splits.
Domain adaptation: coming up soon.

Installation of the environments

# Face cropping and alignment virtual env.
./create_v_env_face_extract.sh

# Pre-processing and training virtual env.
./create_v_env_main.sh

Supported modalities

Vision: vision
Audio: vggish
Text: bert

Pre-processing

Read ./abaw5_pre_processing/README.txt and download the required file and unzip it. Adjust get_root_wsol_dataset() in ./abaw5_pre_processing/dlib/tools.py and in ./default_config.py to point to the absolute path of the folder containing the datasets folders, e.g.: /a/b/c/d/datasets. Inside, there should be the needed datasets folders, e.g.: BAH_DB. Download pretrained weights vggish.pth and res50_ir_0.887.pth from here into the folder ./pretrained_models.

Face cropping and alignment: Here is an example of processing BAH_DB, which is divided data into 8 blocks, and we process block 0.

#!/usr/bin/env bash

source ~/venvs/bah-main-face-extract/bin/activate

cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid

python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split train --nblocks 8 --process_block 0
python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split val --nblocks 8 --process_block 0
python abaw5_pre_processing/dlib/ah_db.py --ds BAH_DB --split test --nblocks 8 --process_block 0

Feature extraction:

#!/usr/bin/env bash

source ~/venvs/bah-main/bin/activate

cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid


python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split val --nparts 1 --part 0
python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split test --nparts 1 --part 0
python abaw5_pre_processing/project/abaw5/main.py --ds BAH_DB --split train --nparts 1 --part 0

# ==============================================================================

Since feature extraction is done by block, we need to gather all the blocks for some results files processing_records*, and dataset_info*. These 2 files need to hold some information for all data. Run:

python post_feature_extract.py

Before running this, change the name of the dataset in post_feature_extract.py.

Compact of face images: Cropped faces need to be compacted into a single file, similarly to other modalities. Example:

#!/usr/bin/env bash

source ~/venvs/bah-main-face-extract/bin/activate

cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid

python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split train --nparts 1 --part 0
python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split val --nparts 1 --part 0
python abaw5_pre_processing/dlib/compact_face_images.py --ds BAH_DB --split test --nparts 1 --part 0

Override frame label: set_frame_labels.py with overload_real_frame_labels()

Training

#!/usr/bin/env bash

source ~/venvs/bah-main/bin/activate

# ==============================================================================
cudaid=$1
export CUDA_VISIBLE_DEVICES=$cudaid


python main.py \
       --train_supervision_type video_fr_sup \
       --dataset_name BAH_DB \
       --use_other_class False \
       --train_p 100.0 \
       --valid_p 100.0 \
       --test_p 100.0 \
       --amp True \
       --seed 0 \
       --mode TRAINING \
       --resume False \
       --modality video+vggish+bert+EXPR_continuous_label \
       --calc_mean_std True \
       --emotion LATER \
       --model_name JMT \
       --num_folds 1 \
       --fold_to_run 0 \
       --use_pretrained_w False \
       --visual_backbone_path None \
       --num_heads 2 \
       --modal_dim 32 \
       --tcn_kernel_size 5 \
       --num_epochs 60 \
       --min_num_epochs 3 \
       --early_stopping 50 \
       --window_length 696 \
       --hop_length 48 \
       --train_batch_size 4 \
       --eval_batch_size 1 \
       --num_workers 12 \
       --opt__weight_decay 0.0001 \
       --opt__name_optimizer SGD \
       --opt__lr 0.008 \
       --opt__momentum 0.9 \
       --opt__dampening 0.0 \
       --opt__nesterov True \
       --opt__beta1 0.9 \
       --opt__beta2 0.999 \
       --opt__eps_adam 1e-08 \
       --opt__amsgrad False \
       --opt__lr_scheduler True \
       --opt__name_lr_scheduler MYSTEP \
       --opt__gamma 0.9 \
       --opt__step_size 50 \
       --opt__last_epoch -1 \
       --opt__min_lr 1e-07 \
       --opt__t_max 100 \
       --opt__mode MIN \
       --opt__factor 0.5 \
       --opt__patience 10 \
       --opt__gradual_release 1 \
       --opt__release_count 3 \
       --opt__milestone 0 \
       --opt__load_best_at_each_epoch False \
       --exp_id 05_14_2025_14_18_15_411877__5413229

BAH presentation

BAH: Capture & Annotation

BAH: Variability

BAH: Experimental Protocol

Experiments: Baselines

1) Frame-level supervised classification using multimodal

2) Video-level supervised classification using multimodal

3) Zero-shot performance: Frame- & video-level

4) Personalization using domain adaptation (frame-level)

Conclusion

This work introduces a new and unique multimodal and subject-based video dataset, BAH, for A/H recognition in videos. BAH contains 300 participants across 9 provinces in Canada. Recruited participants answer 7 designed questions to elicit A/H while recording themselves via webcam and microphone via our web-platform. The dataset amounts to 1,427 videos for a total duration of 10.60 hours with 1.79 hours of A/H. It was annotated by our behavioural team at video- and frame-level.

Our initial benchmarking yielded limited performance highlighting the difficulty of A/H recognition. Our results showed also that leveraging context, multimodality, and adapted feature fusion is a first good direction to design robust models. Our dataset and code are made public.

The following appendix contains related work, more detailed and relevant statistics about the datasets and its diversity, dataset limitations, implementation details, and additional results.

Acknowledgments

This work was supported in part by the Fonds de recherche du Québec – Santé, the Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, and the Digital Research Alliance of Canada. We thank interns that participated in the dataset annotation: Jessica Almeida (Concordia University, Université du Québec à Montréal), and Laura Lucia Ortiz (MBMC).

Thanks

This code is heavily based on github.com/sucv/ABAW3.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
abaw5_pre_processing		abaw5_pre_processing
base		base
dataset-splits		dataset-splits
dllogger		dllogger
doc		doc
fer_models		fer_models
folds/BAH_DB/split-0		folds/BAH_DB/split-0
models		models
pretrained_models		pretrained_models
weights_saved		weights_saved
LICENSE		LICENSE
README.md		README.md
bah_metrics.py		bah_metrics.py
configs.py		configs.py
constants.py		constants.py
create_v_env_face_extract.sh		create_v_env_face_extract.sh
create_v_env_main.sh		create_v_env_main.sh
dataset.py		dataset.py
default_config.py		default_config.py
instantiators.py		instantiators.py
main.py		main.py
metrics.py		metrics.py
parseit.py		parseit.py
reproducibility.py		reproducibility.py
set_frame_labels.py		set_frame_labels.py
tools.py		tools.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change (ICLR2026)

Abstract

Citation:

Content:

BAH dataset: Download

BAH dataset: Splits

Installation of the environments

Supported modalities

Pre-processing

Training

BAH presentation

BAH: Capture & Annotation

BAH: Variability

BAH: Experimental Protocol

Experiments: Baselines

1) Frame-level supervised classification using multimodal

2) Video-level supervised classification using multimodal

3) Zero-shot performance: Frame- & video-level

4) Personalization using domain adaptation (frame-level)

Conclusion

Acknowledgments

Thanks

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

LIVIAETS/bah-dataset

Folders and files

Latest commit

History

Repository files navigation

Abstract

Citation:

Content:

Training

BAH: Capture & Annotation

BAH: Variability

BAH: Experimental Protocol

Experiments: Baselines

1) Frame-level supervised classification using multimodal

2) Video-level supervised classification using multimodal

3) Zero-shot performance: Frame- & video-level

4) Personalization using domain adaptation (frame-level)

Conclusion

Acknowledgments

Thanks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages