Maria Teresa Parreira1,2, Ruidong Zhang1, Sukruth Gowdru Lingaraju2, Alexandra Bremers2, Xuanyu Fang2, Adolfo Ramirez-Aristizabal3, Manaswi Saha3, Michael Kuniavsky3, Cheng Zhang1, Wendy Ju1,2
1Cornell University | 2Cornell Tech | 3Accenture Labs
Figure 1 — Study overview: NeckFace's neck-mounted IR cameras capture chin-region video; NeckNet-18 reconstructs 3D facial blendshapes and head rotations used by the error-detection models.
How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders — a raised eyebrow, a laugh — to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions.
By employing a novel neck-mounted device that records facial expressions from the chin region, we explore the potential of previously untapped data to capture and interpret human responses to robot error. First, we develop NeckNet-18, a 3D facial reconstruction model to map the reactions captured through the chin camera onto facial points and head motion. We then use these facial responses to develop a robot error detection model which outperforms standard methodologies such as using OpenFace or video data, generalizing well both for within- and across-participant data.
Through this work, we argue for expanding human-in-the-loop robot sensing, fostering more seamless integration of robots into diverse human environments, pushing the boundaries of social cue detection and opening new avenues for adaptable and sustainable robotics.
-
NeckNet-18: A lightweight 3D facial reconstruction model (ResNet-18 based) that converts IR camera data from a neck-mounted device into 52 facial Blendshapes and 3 head rotation parameters.
-
Error Detection Models: Machine learning models that detect robot errors from human facial reactions, achieving:
- 84.7% accuracy for single-participant models with only 5% training data
- 5% better accuracy than OpenFace-based methods for cross-participant generalization
- Superior performance compared to RGB camera approaches
-
Comprehensive Benchmark: First study to systematically compare neck-mounted device data against conventional methods (OpenFace, RGB cameras) for error detection in HRI.
Our system consists of two main components:
- Converts IR camera images from NeckFace device into 3D facial expressions
- Outputs 52 Blendshape parameters + 3 head rotation angles
- Requires a short calibration round (~5 minutes) per participant
- Trained on reconstructed facial reactions to detect errors
- Supports both cross-participant and single-participant generalization
- Multiple architectures tested (GRU, LSTM, Transformer, MiniRocket, etc.)
.
├── submission/ # Paper and supplementary materials
│ ├── main.pdf # Main paper
│ ├── supp_material.pdf # Supplementary materials
│ └── *.png # Figures used in paper
│
├── code_neckface_training/ # NeckNet-18 training code
│ ├── train.py # Training script
│ ├── eval.py # Inference script
│ ├── data_preparation.py # Data preprocessing
│ └── README.md # Detailed instructions
│
├── code_preprocessing/ # Dataset creation pipelines
│ └── create_datasets/
│ ├── neckface_rgb/ # NeckFace IR & RGB frame extraction
│ └── neckface_openface_feature_extraction/ # OpenFace processing
│
├── code_timeseries/ # Error detection models (time-series)
│ └── # Models: GRU, LSTM, BiLSTM, Transformer, MiniRocket, etc.
│
├── code_rgb/ # Error detection models (frame-based)
│ └── # Models: ResNet34-based CNNs
│
└── stimulus_dataset_information.xlsx # Stimulus video dataset metadata
- Python 3.8+
- PyTorch 1.10+
- CUDA-capable GPU (recommended)
- NeckFace device (for data collection)
# Clone the repository
git clone https://github.com/yourusername/badrobots-feat-neckface.git
cd badrobots-feat-neckface
# Install dependencies for NeckNet-18 training
cd code_neckface_training
pip install -r requirements.txt
# Install dependencies for error detection models
cd ../code_timeseries
pip install tsai torch torchvisioncd code_neckface_training
# Prepare your data (see code_neckface_training/README.md for details)
python data_preparation.py -p /path/to/participant/data
# Train NeckNet-18
python train.py -p /path/to/participant/data -ts 01 -g 0 -t 1 -o experiment_name
# Generate predictions
python eval.py --resume /path/to/checkpoint.tar -v /path/to/video.aviPre-trained Model: Download the NeckFace meta model trained on the original user study: neckface_us_full_model.tar
cd code_timeseries
# Train error detection model on NeckData
python train_model.py --dataset neckdata --model gru_fcn --folds 5
# Evaluate on test set
python evaluate_model.py --checkpoint best_model.pth --dataset neckdataOur work uses four datasets for benchmarking:
- NeckData: 55 features (52 Blendshapes + 3 head rotations) reconstructed from NeckFace IR cameras using NeckNet-18
- OpenData: 49 features (Action Units, gaze, pose) extracted using OpenFace from RGB videos
- NeckFaceIR: Raw IR video frames from NeckFace cameras (224×224, 12 fps)
- RGBData: RGB video frames from participant-facing camera (224×224, 12 fps)
Stimulus Videos: 30 videos (10 human errors, 10 robot errors, 10 control) - see stimulus_dataset_information.xlsx
- MAE (Facial Motion): 34.0 ± 6.9
- MAE (Head Rotation): 7.0 ± 1.3
| Model Type | Dataset | Accuracy | F1-Score |
|---|---|---|---|
| GRU_FCN | NeckData | 65.8% | 63.7% |
| gMLP | OpenData | 60.6% | 53.5% |
| GRU_FCN (Single-Participant, 5% training) | NeckData | 84.7% | 84.2% |
| InceptionTime (Single-Participant, 5% training) | OpenData | 78.8% | 78.2% |
Key Finding: NeckData models outperform OpenData models, especially with limited training data, demonstrating the value of 3D facial reconstruction from neck-mounted sensors.
- Cameras: 2× IR cameras (Arducam) mounted on neck band
- Lighting: 2× 850 nm IR LEDs per camera
- Controller: Raspberry Pi 4B with Arducam multi-camera adapter
- Calibration: iPhone 11 Pro with TrueDepth camera (for ground truth)
For device fabrication details, see NeckFace: Continuously Tracking Full Facial Expressions on Neck-mounted Wearables
If you use this work in your research, please cite:
@inproceedings{parreira2025whyface,
title={"Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions},
author={Parreira, Maria Teresa and Zhang, Ruidong and Lingaraju, Sukruth Gowdru and Bremers, Alexandra and Fang, Xuanyu and Ramirez-Aristizabal, Adolfo and Saha, Manaswi and Kuniavsky, Michael and Zhang, Cheng and Ju, Wendy},
booktitle={TBD},
year={2025},
organization={TBD}
}- NeckFace: Chen et al., 2021 - Original neck-mounted facial expression tracking system
- BAD Dataset: Bremers et al., 2023 - Bystander Affect Detection dataset for HRI failure detection
- Err@HRI Challenge: Spitale et al., 2024 - Multimodal error detection challenge
This work was supported by Cornell University and Accenture Labs. We thank all participants in our user study. Special thanks to the original NeckFace team for providing the foundational device and pre-trained models.
This project is licensed under the MIT License - see the LICENSE file for details.
For questions or issues, please open an issue on GitHub or contact:
- Maria Teresa Parreira: mb2554@cornell.edu
