2025-12-01๐ StreamGaze benchmark and evaluation code released!
- ๐ฐ News
- ๐ StreamGaze Benchmark
- ๐ Quick Start
- ๐ง Adding Your Model
- ๐ StreamGaze Data Generation Pipeline
- ๐ Citation
- ๐ Acknowledgements
- ๐ง Contact
| Category | Metric | Count |
|---|---|---|
| ๐น Videos | Total Videos | 285 |
| ๐ QA Pairs | Total Questions | 8,521 |
| ๐ฏ Tasks | Task Categories | 10 tasks (4 Past + 4 Present + 2 Proactive) |
Models must remember and reason about events that occurred earlier in the video stream.
- Scene Recall (SR): What objects did the user interact with?
- Object Transition Prediction (OTP): Which object will the user look at next, given past patterns?
- Gaze Sequence Matching (GSM): Which gaze pattern matches the user's attention flow?
- Non-Fixated Objects Identification (NFI): Which objects appeared but were never gazed at?
Models must identify and understand what is currently happening based on real-time gaze.
- Object Identification (Easy/Hard): What is the user currently looking at?
- Object Attribute Recognition (OAR): What are the characteristics of the gazed object?
- Future Action Prediction (FAP): What action is the user about to perform?
Models must anticipate future events and proactively respondโthe most challenging category.
- Gaze-Triggered Alert (GTA): Notify when the user gazes at a specific target object
- Object Appearance Alert (OAA): Alert when a target object appears in the scene
We share the same structure with StreamingBench!
Download our dataset from HuggingFace and locate like below:
StreamGaze/
โโโ dataset/
โ โโโ videos/
โ โ โโโ original_video/ # Original egocentric videos
โ โ โโโ gaze_viz_video/ # Videos with gaze overlay
โ โโโ qa/
โ โโโ past_*.json # Past task QA pairs
โ โโโ present_*.json # Present task QA pairs
โ โโโ proactive_*.json # Proactive task QA pairs
Quick evaluation on existing models:
# Evaluate ViSpeak (without gaze visualization)
bash scripts/vispeak.sh
# Evaluate ViSpeak (with gaze visualization)
bash scripts/vispeak.sh --use_gaze_instruction
# Evaluate GPT-4o
bash scripts/gpt4o.sh --use_gaze_instruction
# Evaluate Qwen2.5-VL
bash scripts/qwen25vl.sh --use_gaze_instructionResults will be automatically computed and saved to:
results/
โโโ ModelName/
โ โโโ results/ # Without gaze visualization
โ โ โโโ *_output.json
โ โ โโโ evaluation_summary.json
โ โโโ results_viz/ # With gaze visualization
โ โโโ *_output.json
โ โโโ evaluation_summary.json
Want to evaluate your own model on StreamGaze? Follow our comprehensive guide here!
Create src/model/YourModel.py:
from model.modelclass import Model
class YourModel(Model):
def __init__(self):
# Load your model
self.model = ...
self.processor = ...
def Run(self, file, inp, start_time, end_time, question_time,
omni=False, proactive=False, salience_map_path=None):
# Process video and generate response
return "Your model's response"
def name(self):
return "YourModel"Add to src/eval.py:
elif args.model_name == "YourModel":
from model.YourModel import YourModel
model = YourModel()Create scripts/yourmodel.sh:
#!/bin/bash
ROOT_DIR="/path/to/StreamGaze"
MODEL_NAME="YourModel"
# Run evaluation
bash scripts/yourmodel.sh --use_gaze_instructionWe provide an end-to-end automatic data generation pipeline that processes raw gaze data from egocentric videos and generates high-quality temporal reasoning QA pairs.
Pipeline Stages:
- Steps 0-1: Gaze projection & fixation extraction
- Steps 1.5-2: Quality filtering & object identification (InternVL-3.5 38B)
- Step 2.5: Sequence filtering & metadata merging
- Step 3: QA pair generation for 12 task types
- Step 4: QA validation & filtering (Qwen3VL 30B)
Supported Datasets: EGTEA-Gaze+, Ego4D-Gaze, HoloAssist, EgoExoLearn
๐ Full pipeline documentation: pipeline/
# Quick start
cd pipeline
bash pipeline.sh --dataset egteaIf you find StreamGaze useful in your research, please consider citing our work:
@misc{lee2025streamgazegazeguidedtemporalreasoning,
title={StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos},
author={Daeun Lee and Subhojyoti Mukherjee and Branislav Kveton and Ryan A. Rossi and Viet Dac Lai and Seunghyun Yoon and Trung Bui and Franck Dernoncourt and Mohit Bansal},
year={2025},
eprint={2512.01707},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.01707},
}We thank the following projects and datasets that made StreamGaze possible:
- EGTEA Gaze+
- EgoExoLearn
- HoloAssist
- StreamingBench
We also thank the open-source community for providing excellent multimodal models:
- ViSpeak, InternVL, Qwen-VL, LLaVA-OneVision, Video-LLaMA, and many others
For questions, issues, or collaborations:
- ๐ง Email: daeun@cs.unc.edu
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
โญ Star us on GitHub if you find StreamGaze useful!
Made with โค๏ธ by UNC Chapel Hill & Adobe Research


