Skip to content

daeunni/StreamGaze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‘๏ธ StreamGaze: Gaze-Guided Temporal Reasoning
and Proactive Understanding in Streaming Videos


arXiv Website HF Dataset
1UNC Chapel Hillโ€ƒ 2Adobe Researchโ€ƒ


๐Ÿ“ฐ News

  • 2025-12-01 ๐Ÿš€ StreamGaze benchmark and evaluation code released!

๐Ÿ“‹ Contents


๐Ÿ“Š StreamGaze Benchmark

Dataset Statistics

Category Metric Count
๐Ÿ“น Videos Total Videos 285
๐Ÿ“ QA Pairs Total Questions 8,521
๐ŸŽฏ Tasks Task Categories 10 tasks (4 Past + 4 Present + 2 Proactive)

Task Categories

๐Ÿ”™ Past Tasks: Memory & Temporal Recall

Models must remember and reason about events that occurred earlier in the video stream.

  • Scene Recall (SR): What objects did the user interact with?
  • Object Transition Prediction (OTP): Which object will the user look at next, given past patterns?
  • Gaze Sequence Matching (GSM): Which gaze pattern matches the user's attention flow?
  • Non-Fixated Objects Identification (NFI): Which objects appeared but were never gazed at?

๐Ÿ‘๏ธ Present Tasks: Real-time Perception & Reasoning

Models must identify and understand what is currently happening based on real-time gaze.

  • Object Identification (Easy/Hard): What is the user currently looking at?
  • Object Attribute Recognition (OAR): What are the characteristics of the gazed object?
  • Future Action Prediction (FAP): What action is the user about to perform?

๐Ÿ”ฎ Proactive Tasks: Anticipation & Alerting

Models must anticipate future events and proactively respondโ€”the most challenging category.

  • Gaze-Triggered Alert (GTA): Notify when the user gazes at a specific target object
  • Object Appearance Alert (OAA): Alert when a target object appears in the scene

Results

๐Ÿš€ Quick Start

We share the same structure with StreamingBench!

Data Preparation

Download our dataset from HuggingFace and locate like below:

StreamGaze/
โ”œโ”€โ”€ dataset/
โ”‚   โ”œโ”€โ”€ videos/
โ”‚   โ”‚   โ”œโ”€โ”€ original_video/        # Original egocentric videos
โ”‚   โ”‚   โ””โ”€โ”€ gaze_viz_video/        # Videos with gaze overlay
โ”‚   โ””โ”€โ”€ qa/
โ”‚       โ”œโ”€โ”€ past_*.json             # Past task QA pairs
โ”‚       โ”œโ”€โ”€ present_*.json          # Present task QA pairs
โ”‚       โ””โ”€โ”€ proactive_*.json        # Proactive task QA pairs

Running Evaluation

Quick evaluation on existing models:

# Evaluate ViSpeak (without gaze visualization)
bash scripts/vispeak.sh

# Evaluate ViSpeak (with gaze visualization)
bash scripts/vispeak.sh --use_gaze_instruction

# Evaluate GPT-4o
bash scripts/gpt4o.sh --use_gaze_instruction

# Evaluate Qwen2.5-VL
bash scripts/qwen25vl.sh --use_gaze_instruction

Results will be automatically computed and saved to:

results/
โ”œโ”€โ”€ ModelName/
โ”‚   โ”œโ”€โ”€ results/              # Without gaze visualization
โ”‚   โ”‚   โ”œโ”€โ”€ *_output.json
โ”‚   โ”‚   โ””โ”€โ”€ evaluation_summary.json
โ”‚   โ””โ”€โ”€ results_viz/          # With gaze visualization
โ”‚       โ”œโ”€โ”€ *_output.json
โ”‚       โ””โ”€โ”€ evaluation_summary.json

๐Ÿ”ง Adding Your Model

Want to evaluate your own model on StreamGaze? Follow our comprehensive guide here!

Step 1: Implement Model Wrapper

Create src/model/YourModel.py:

from model.modelclass import Model

class YourModel(Model):
    def __init__(self):
        # Load your model
        self.model = ...
        self.processor = ...
    
    def Run(self, file, inp, start_time, end_time, question_time, 
            omni=False, proactive=False, salience_map_path=None):
        # Process video and generate response
        return "Your model's response"
    
    def name(self):
        return "YourModel"

Step 2: Register Model

Add to src/eval.py:

elif args.model_name == "YourModel":
    from model.YourModel import YourModel
    model = YourModel()

Step 3: Create Evaluation Script

Create scripts/yourmodel.sh:

#!/bin/bash
ROOT_DIR="/path/to/StreamGaze"
MODEL_NAME="YourModel"

# Run evaluation
bash scripts/yourmodel.sh --use_gaze_instruction

๐Ÿ“Š StreamGaze Data Generation Pipeline

We provide an end-to-end automatic data generation pipeline that processes raw gaze data from egocentric videos and generates high-quality temporal reasoning QA pairs.

Pipeline Stages:

  • Steps 0-1: Gaze projection & fixation extraction
  • Steps 1.5-2: Quality filtering & object identification (InternVL-3.5 38B)
  • Step 2.5: Sequence filtering & metadata merging
  • Step 3: QA pair generation for 12 task types
  • Step 4: QA validation & filtering (Qwen3VL 30B)

Supported Datasets: EGTEA-Gaze+, Ego4D-Gaze, HoloAssist, EgoExoLearn

๐Ÿ“‚ Full pipeline documentation: pipeline/

# Quick start
cd pipeline
bash pipeline.sh --dataset egtea

๐Ÿ“– Citation

If you find StreamGaze useful in your research, please consider citing our work:

@misc{lee2025streamgazegazeguidedtemporalreasoning,
      title={StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos}, 
      author={Daeun Lee and Subhojyoti Mukherjee and Branislav Kveton and Ryan A. Rossi and Viet Dac Lai and Seunghyun Yoon and Trung Bui and Franck Dernoncourt and Mohit Bansal},
      year={2025},
      eprint={2512.01707},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.01707}, 
}

๐Ÿ™ Acknowledgements

We thank the following projects and datasets that made StreamGaze possible:

  • EGTEA Gaze+
  • EgoExoLearn
  • HoloAssist
  • StreamingBench

We also thank the open-source community for providing excellent multimodal models:

  • ViSpeak, InternVL, Qwen-VL, LLaVA-OneVision, Video-LLaMA, and many others

๐Ÿ“ง Contact

For questions, issues, or collaborations:


โญ Star us on GitHub if you find StreamGaze useful!

Made with โค๏ธ by UNC Chapel Hill & Adobe Research

About

Code for StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published