👁️ StreamGaze: Gaze-Guided Temporal Reasoning
and Proactive Understanding in Streaming Videos

Daeun Lee¹, Subhojyoti Mukherjee², Branislav Kveton², Ryan A. Rossi², Viet Dac Lai², Seunghyun Yoon², Trung Bui², Franck Dernoncourt², Mohit Bansal¹

¹UNC Chapel Hill ²Adobe Research

📰 News

2025-12-01 🚀 StreamGaze benchmark and evaluation code released!

📋 Contents

📰 News
📊 StreamGaze Benchmark
🚀 Quick Start
- Data Preparation
- Running Evaluation
🔧 Adding Your Model
📊 StreamGaze Data Generation Pipeline
📖 Citation
🙏 Acknowledgements
📧 Contact

📊 StreamGaze Benchmark

Dataset Statistics

Category	Metric	Count
📹 Videos	Total Videos	285
📝 QA Pairs	Total Questions	8,521
🎯 Tasks	Task Categories	10 tasks (4 Past + 4 Present + 2 Proactive)

Task Categories

🔙 Past Tasks: Memory & Temporal Recall

Models must remember and reason about events that occurred earlier in the video stream.

Scene Recall (SR): What objects did the user interact with?
Object Transition Prediction (OTP): Which object will the user look at next, given past patterns?
Gaze Sequence Matching (GSM): Which gaze pattern matches the user's attention flow?
Non-Fixated Objects Identification (NFI): Which objects appeared but were never gazed at?

👁️ Present Tasks: Real-time Perception & Reasoning

Models must identify and understand what is currently happening based on real-time gaze.

Object Identification (Easy/Hard): What is the user currently looking at?
Object Attribute Recognition (OAR): What are the characteristics of the gazed object?
Future Action Prediction (FAP): What action is the user about to perform?

🔮 Proactive Tasks: Anticipation & Alerting

Models must anticipate future events and proactively respond—the most challenging category.

Gaze-Triggered Alert (GTA): Notify when the user gazes at a specific target object
Object Appearance Alert (OAA): Alert when a target object appears in the scene

Results

🚀 Quick Start

We share the same structure with StreamingBench!

Data Preparation

Download our dataset from HuggingFace and locate like below:

StreamGaze/
├── dataset/
│   ├── videos/
│   │   ├── original_video/        # Original egocentric videos
│   │   └── gaze_viz_video/        # Videos with gaze overlay
│   └── qa/
│       ├── past_*.json             # Past task QA pairs
│       ├── present_*.json          # Present task QA pairs
│       └── proactive_*.json        # Proactive task QA pairs

Running Evaluation

Quick evaluation on existing models:

# Evaluate ViSpeak (without gaze visualization)
bash scripts/vispeak.sh

# Evaluate ViSpeak (with gaze visualization)
bash scripts/vispeak.sh --use_gaze_instruction

# Evaluate GPT-4o
bash scripts/gpt4o.sh --use_gaze_instruction

# Evaluate Qwen2.5-VL
bash scripts/qwen25vl.sh --use_gaze_instruction

Results will be automatically computed and saved to:

results/
├── ModelName/
│   ├── results/              # Without gaze visualization
│   │   ├── *_output.json
│   │   └── evaluation_summary.json
│   └── results_viz/          # With gaze visualization
│       ├── *_output.json
│       └── evaluation_summary.json

🔧 Adding Your Model

Want to evaluate your own model on StreamGaze? Follow our comprehensive guide here!

Step 1: Implement Model Wrapper

Create src/model/YourModel.py:

from model.modelclass import Model

class YourModel(Model):
    def __init__(self):
        # Load your model
        self.model = ...
        self.processor = ...
    
    def Run(self, file, inp, start_time, end_time, question_time, 
            omni=False, proactive=False, salience_map_path=None):
        # Process video and generate response
        return "Your model's response"
    
    def name(self):
        return "YourModel"

Step 2: Register Model

Add to src/eval.py:

elif args.model_name == "YourModel":
    from model.YourModel import YourModel
    model = YourModel()

Step 3: Create Evaluation Script

Create scripts/yourmodel.sh:

#!/bin/bash
ROOT_DIR="/path/to/StreamGaze"
MODEL_NAME="YourModel"

# Run evaluation
bash scripts/yourmodel.sh --use_gaze_instruction

📊 StreamGaze Data Generation Pipeline

We provide an end-to-end automatic data generation pipeline that processes raw gaze data from egocentric videos and generates high-quality temporal reasoning QA pairs.

Pipeline Stages:

Steps 0-1: Gaze projection & fixation extraction
Steps 1.5-2: Quality filtering & object identification (InternVL-3.5 38B)
Step 2.5: Sequence filtering & metadata merging
Step 3: QA pair generation for 12 task types
Step 4: QA validation & filtering (Qwen3VL 30B)

Supported Datasets: EGTEA-Gaze+, Ego4D-Gaze, HoloAssist, EgoExoLearn

📂 Full pipeline documentation: pipeline/

# Quick start
cd pipeline
bash pipeline.sh --dataset egtea

📖 Citation

If you find StreamGaze useful in your research, please consider citing our work:

@misc{lee2025streamgazegazeguidedtemporalreasoning,
      title={StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos}, 
      author={Daeun Lee and Subhojyoti Mukherjee and Branislav Kveton and Ryan A. Rossi and Viet Dac Lai and Seunghyun Yoon and Trung Bui and Franck Dernoncourt and Mohit Bansal},
      year={2025},
      eprint={2512.01707},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.01707}, 
}

🙏 Acknowledgements

We thank the following projects and datasets that made StreamGaze possible:

EGTEA Gaze+
EgoExoLearn
HoloAssist
StreamingBench

We also thank the open-source community for providing excellent multimodal models:

ViSpeak, InternVL, Qwen-VL, LLaVA-OneVision, Video-LLaMA, and many others

📧 Contact

For questions, issues, or collaborations:

📧 Email: daeun@cs.unc.edu
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

⭐ Star us on GitHub if you find StreamGaze useful!

Made with ❤️ by UNC Chapel Hill & Adobe Research

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
docs		docs
pipeline		pipeline
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
evaluate_results.py		evaluate_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👁️ StreamGaze: Gaze-Guided Temporal Reasoning
and Proactive Understanding in Streaming Videos

📰 News

📋 Contents

📊 StreamGaze Benchmark

Dataset Statistics

Task Categories

🔙 Past Tasks: Memory & Temporal Recall

👁️ Present Tasks: Real-time Perception & Reasoning

🔮 Proactive Tasks: Anticipation & Alerting

Results

🚀 Quick Start

Data Preparation

Running Evaluation

🔧 Adding Your Model

Step 1: Implement Model Wrapper

Step 2: Register Model

Step 3: Create Evaluation Script

📊 StreamGaze Data Generation Pipeline

📖 Citation

🙏 Acknowledgements

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

daeunni/StreamGaze

Folders and files

Latest commit

History

Repository files navigation

👁️ StreamGaze: Gaze-Guided Temporal Reasoningand Proactive Understanding in Streaming Videos

📰 News

📋 Contents

📊 StreamGaze Benchmark

Dataset Statistics

Task Categories

🔙 Past Tasks: Memory & Temporal Recall

👁️ Present Tasks: Real-time Perception & Reasoning

🔮 Proactive Tasks: Anticipation & Alerting

Results

🚀 Quick Start

Data Preparation

Running Evaluation

🔧 Adding Your Model

Step 1: Implement Model Wrapper

Step 2: Register Model

Step 3: Create Evaluation Script

📊 StreamGaze Data Generation Pipeline

📖 Citation

🙏 Acknowledgements

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

👁️ StreamGaze: Gaze-Guided Temporal Reasoning
and Proactive Understanding in Streaming Videos

Packages