Skip to content

zhao-chunyu/VP2Net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 

Repository files navigation

logo

Authors:Chunyu Zhao, Tao Deng📧, Pengcheng Du, Wenbo Liu, Yi Huang, Fei Yan

Contact: springyu.zhao@foxmail.com      📧: corresponding author

💻Dataset

Through the process of re-labeling, we obtain an attention-based driving event dataset (ADED) consisting of 1101 videos. The dataset provides semantic annotation for each driving video, and the semantic information contains annotations of driving event categories and driving event time windows. The driving event categories contain six categories, which are Driving Normally (DN), Avoiding Pedestrian Crossing (ACP) Waiting for Vehicle Ahead (WVA), Waiting for Red Light (SRL), Stop Sign Stopping (SSS) and Avoiding Lane Changing Vehicle (ALC).

dataset_make
Fig. 1. ADED dataset annotation process. On the left is the annotation process for the entire ADED dataset. The heatmaps are derived from the BDD-A dataset, captured through eye-tracking devices to represent driver’s attention. On the right is the annotation process for event time windows of driving event.
dataset_show
Fig. 2. ADED dataset statistics. (a) The number and proportion of each driving event class. (b) The distribution of the duration of driving events. (c) The distribution of the occurrence of driving events along the timeline.
TABLE I: Comparison of Traffic Scene Datasets in Terms of Weather Conditions, Annotations, and Videos. TABLE II: Comparison of DADA-2000, PSAD, And Our Dataset in Terms of Statistical Properties and t-SNE Feature Visualization.
dataset_c

✨Model

model
Fig. 3. Perception-inspired Network (VP²Net). Our model takes driving video sequences as input, where the SIE branch extracts bottom-up driving scene information and the APE branch extracts top-down driver attention information (which undergoes attention perception — “where to focus”, attention enhancement — “when to focus”, and information encoding). Subsequently, attention information guides the fusion of driving scene features, further decoded to produce the output. F1 is the attention information encoder. F2 is the event information decoder.

🚀 Quantitative Analysis

TABLE III: Quantitative Results of Different Models on the ADED, DADA-2000, PSAD Datasets.
compare

🚀Visualization of Intermediate Results

feature
Fig. 4. The visualization of the intermediate features. (a) represents the original image, (b) depicts the driving scene feature F_SIE, (c) depicts the driving scene feature F_uniformer by Uniformer, (d) shows the attention information S_hat, (e) displays the perception-enhanced information S_star, and (f) illustrates the attention-encoded information F_APE. These cases demonstrate the network’s mechanism and enhancement strategy, rather than the average performance across the dataset.

💖Support the Project

Thanks to the open-source video action detection models (ViViT, VideoMAE) at huggingface🤗 for supporting this paper.

📄Cite

If you find this repository useful, please use the following BibTeX entry for citation and give us a star⭐.

@article{zhao2025vp2net, 
  title={VP²Net: Visual Perception-Inspired Network for Exploring the Causes of DriversAttention Shift}, 
  journal={IEEE Transactions on Intelligent Ttansportation Systems}, 
  author={Zhao, Chunyu and Deng, Tao and Du, Pengcheng and Liu, Wenbo and Huang, Yi and Yan, Fei}, 
  year={2025}
}

About

[T-ITS'2025] VP²Net: Visual Perception-Inspired Network for Exploring the Causes of Drivers’ Attention Shift

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors