A framework for building scene graphs from egocentric video using gaze data to predict future actions.
This project builds scene graphs from egocentric video and gaze data to capture spatial and temporal relationships between objects, which can be used for action anticipation. The system:
- Processes fixations and saccades to identify objects using CLIP
- Constructs a scene graph with nodes (objects) and edges (transitions)
- Extracts features at specified timestamps for downstream tasks
- Provides interactive visualization of the graph construction process
Note: Refer to "Quick Start" section for instructions on how to visualize the graph construction process using the provided sample data.
src/gazegraph/ # Main package code
├── config/ # YAML configuration files and utilities
├── datasets/ # Dataset loaders and processors
│ └── egtea_gaze/ # EGTEA Gaze+ dataset specific code
├── graph/ # Scene graph construction and processing
│ ├── dashboard/ # Interactive visualization dashboard
│ │ ├── components/ # UI components (video, graph display, controls)
│ │ ├── playback/ # Event handling and state management
│ │ └── utils/ # Dashboard utilities and SVG generation
│ └── *.py # Core graph processing (builder, tracer, visualizer)
├── models/ # Feature extraction and object detection (CLIP, SIFT, YOLO-World)
├── training/ # Training infrastructure
│ ├── dataset/ # Graph datasets, dataloaders, and augmentations
│ ├── evaluation/ # Metrics and evaluation utilities
│ └── tasks/ # Task definitions (next_action, future_actions)
├── logger.py, main.py # Logging and main entry point
└── setup_scratch.py # Dataset setup utilities
datasets/sample_data/ # Sample data for graph visualization
figures/ # Documentation and visualization outputs
scripts/ # Utility and build scripts
tests/ # Test suite with component-specific tests and fixtures
- Python 3.12+
- Access to a Dropbox token for downloading the EGTEA Gaze+ dataset
- uv package manager (setup scripts will install if not present)
Note: The graph visualization only requires the raw video file and corresponding trace file to be present, which we provide in
datasets/sample_data/. All other steps, except for the environment setup, are optional.
-
Setup environment: Choose the appropriate setup script based on your environment:
For Student Cluster:
source scripts/setup_student_cluster_env.shFor Euler Cluster:
source scripts/setup_euler_cluster_env.shThese scripts will:
- Check if uv is installed, and install it if necessary
- Ensure the proper Python version is available
- Synchronize project dependencies using the lockfile (
uv sync)
-
Create a Dropbox token:
- Create an app at Dropbox App Console
- Enable
sharing.readpermission - Generate an OAuth 2.0 token
- Create a
.envfile in the project root and add the following:# Dropbox token for dataset downloads DROPBOX_TOKEN=your_token_here # Optional: Custom config path CONFIG_PATH=path/to/your/config.yaml
Note: Dropbox tokens expire after a period of time. If you encounter authentication errors, you'll need to generate a new token following the steps above.
-
Download dataset:
scripts/run.sh setup-scratch
-
Build scene graphs:
scripts/run.sh build-graphs
To enable tracing for visualization:
scripts/run.sh build-graphs --videos VIDEO_NAME --enable-tracing
-
Train a model:
scripts/run.sh train --task future_actions
Or for next action prediction:
scripts/run.sh train --task next_action
-
Visualize graph construction
To visualize the graph construction process e.g. using the sample data:
scripts/run.sh visualize --video-path datasets/sample_data/OP03-R02-TurkeySandwich.mp4 --trace-path datasets/sample_data/OP03-R02-TurkeySandwich_trace.jsonl
The interactive dashboard should now be available at http://127.0.0.1:8050/
This project can generate gaze-augmented features that can be used to enhance the performance of the Ego-Topo model. The workflow involves generating fixation labels and ROI features using this repository, composing them with the base features provided by Ego-Topo, and then using the resulting feature file within a cloned Ego-Topo repository.
- You must have our fork of the Ego-Topo repository cloned and set up according to its official instructions. You can clone it via:
git clone https://github.com/jankulik/ego-topo-gaze.git
- Ensure you have downloaded the base features from the Ego-Topo project (e.g.,
train_lfb_s30_verb.pklandval_lfb_s30_verb.pkl).
Step 1: Setup This Repository
If you haven't already, set up the gaze-guided-scene-graph environment and download the necessary datasets by following the "Quick Start" instructions.
Step 2: Generate Fixation Labels
The first step is to process the EGTEA Gaze+ videos to determine the most likely fixated object for each frame. This is done using the label-fixations command.
This command will produce two files:
*_label_map.pkl: A map from each video frame to a detected object label.*_roi_map.pkl: (Optional) A map from each video frame to a CLIP embedding of the object's Region of Interest (ROI).
Run the command for both the training and validation splits. The --skip-roi flag can be used to speed up the process if you only need object labels initially.
# For the training split
scripts/run.sh label-fixations --in-pkl /path/to/ego-topo/features/train_lfb_s30_verb.pkl
# For the validation split
scripts/run.sh label-fixations --in-pkl /path/to/ego-topo/features/val_lfb_s30_verb.pklNote: The output files will be saved in the directory specified by
directories.featuresin yourconfig.yaml, which defaults todata/features/.
Step 3: Compose Gaze-Augmented Features
Next, combine the base Ego-Topo features with the fixation labels (and optionally ROI embeddings) you just generated. Use the compose-features command for this.
There are several composition modes available. For gaze-augmented Ego-Topo, you might use:
onehot: Concatenates the base feature with a one-hot vector of the fixated object label.clip: Concatenates the base feature with a CLIP text embedding of the fixated object label.roi: Concatenates the base feature with a CLIP image embedding of the fixated object's ROI.
# Example: Compose features for the training split using one-hot encoding
scripts/run.sh compose-features \
--mode onehot \
--base-pkl /path/to/ego-topo/features/train_lfb_s30_verb.pkl \
--fix-pkl data/features/train_lfb_s30_verb_label_map.pkl \
--out-pkl data/features/composed_train_onehot.pkl
# Example: Compose features for the validation split using one-hot encoding
scripts/run.sh compose-features \
--mode onehot \
--base-pkl /path/to/ego-topo/features/val_lfb_s30_verb.pkl \
--fix-pkl data/features/val_lfb_s30_verb_label_map.pkl \
--out-pkl data/features/composed_val_onehot.pklRepeat this step for each composition mode you wish to evaluate.
Step 4: Train with Augmented Features in the Ego-Topo Repository
Finally, use the newly created composed feature files (composed_*.pkl) to train the model within your cloned Ego-Topo repository.
You will need to modify the training script in the Ego-Topo repository to point to these new feature files. For example, you might change its main.py or configuration to load composed_train_onehot.pkl instead of the original train_lfb_s30_verb.pkl.
Please refer to the Ego-Topo repository's documentation for specific instructions on how to run training with custom feature files.
The project uses YAML configuration files with a hierarchical structure:
base: # Root directories
directories: # Directory structure
dataset: # Dataset settings and paths
timestamps: # Feature extraction timestamps
egtea: # EGTEA dataset paths
ego_topo: # Ego-topo data paths
output: # Output paths
models: # Model settings
external: # External resources
processing: # Processing settings
trace_dir: # Directory for graph construction trace files
Key features:
- Path references:
${path.to.reference} - Environment variables:
${USER},${REPO_ROOT} - Configuration files:
student_cluster_config.yaml,euler_cluster_config.yaml - Environment variable
CONFIG_PATHto specify configuration file path
The project supports configuration via environment variables in a .env file in the project root:
-
Create a
.envfile:cp .env.example .env
-
Available environment variables:
# Configuration file path CONFIG_PATH=path/to/your/config.yaml # Dropbox token for dataset downloads DROPBOX_TOKEN=your_token_here
Environment variables take precedence over defaults in the code. When specified, CONFIG_PATH
determines which configuration file is loaded by default, but can still be overridden with the
--config command-line argument.
To run tests locally (using mock models):
# Run all tests, automatically skipping those requiring real models
pytest tests/
# Run only unit tests
pytest tests/unit/
# Force run tests requiring real models
pytest --run-real-model tests/For more resource-intensive tests, you can submit a job to the cluster:
# Submit test job to cluster
sbatch scripts/run_tests.shTest logs will be written to logs/tests.out.
The project includes TensorBoard integration for visualizing training metrics and model performance. The logs are stored under logs/{task}/ by default.
# Run training with TensorBoard logging
scripts/run.sh train --task next_action --device gpu
# Run training with TensorBoard logging
scripts/run.sh train --task future_actions --device gpu
# Launch TensorBoard to view metrics
tensorboard --logdir logsscripts/run.sh [options] <command>Commands:
setup-scratch: Download and setup datasetbuild-graphs: Build scene graphs from videostrain: Train a GNN on a specified taskvisualize: Visualize graph construction process
Global Options:
--config: Path to custom config file--log-level: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)--log-file: Path to log file