Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

This repository contains the code and resources for the paper titled "Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps", accepted for presentation at the International Conference on Machine Learning and Applications (ICMLA 2025). The approach introduces a novel method of discretizing continuous human motion using a Vector Quantized Generative Adversarial Network (VQ-GAN) for accurate and efficient motion representation.

Overview

The project focuses on:

Discretizing motion information using spatiotemporal heatmaps.
Extreme compression of human pose data while retaining essential information.
Decoupling embedding creation from Vision Transformers, allowing for more flexible motion analysis.

The experiments demonstrate the effectiveness of this approach across different camera perspectives (egocentric and exocentric) and various compression rates.

Model Performance Summary

Model Type	Compression	Vocab Size	SSIM↑	PSNR↑	L1↓	T-Std↓	Q-Loss↓
Single Egocentric	F8	512	0.975	31.23	0.005	0.212	0.0013
Single Egocentric	F16	512	0.950	28.06	0.008	0.217	0.0033
Single Egocentric	F16	256	0.954	28.39	0.007	0.219	0.0008
Single Egocentric	F16	128	0.954	28.30	0.007	0.220	0.0003
Single Egocentric	F32	512	0.913	25.28	0.011	0.222	0.0009
Multi Exocentric	F8	1024	0.921	25.37	0.011	0.219	0.0015
Multi Exocentric	F8	512	0.913	26.19	0.010	0.221	0.0014
Multi Exocentric	F8	256	0.912	25.07	0.012	0.217	0.0033
Multi Exocentric	F16	1024	0.518	19.42	0.057	0.236	0.0034
3D Projection	F8	1024	0.934	31.65	0.005	0.210	0.0009
3D Projection	F16	1024	0.912	28.45	0.008	0.237	0.0010
3D Projection	F16	512	0.866	27.21	0.009	0.219	0.0010
3D Projection	F16	256	0.866	27.01	0.009	0.219	0.0010
3D Projection	F32	1024	0.858	26.53	0.011	0.225	0.0001

Training the Model

To train the model, you can use different configurations depending on the task and architecture. Below are some common examples:

python train.py --cfg configs/c2d_2d_vqgan.yml
python train.py --cfg configs/m2d_m2d_vqgan.yml
python train.py --cfg configs/3d_3d_vqgan.yml

Configuration Files

The configuration files are located in the configs/ directory. These files allow you to adjust key features like compression rate, number of codebook vectors, and other hyperparameters.

You can change:

Compression Rate: Adjust how much data is compressed (e.g., F8, F16, F32).
Number of Codebook Vectors: Control the size of the latent space.

To resume training, specify the path to your checkpoint file in the configuration under resume. For example:

resume: 'experiments/sandy-jazz-107/3dvqgan_e9_sandy-jazz-107.pt'

Testing the Model

To test multiple models, use the test.py script with a configuration file like configs/test_multiple_models.yml:

python test.py --cfg configs/test_multiple_models.yml

In the test_multiple_models.yml file, you can list multiple model configurations:

models:
  clear-dawn-120:
    config_file: 'c2d_2d_vqgan.yml'
    <<: *F8
    num_codebook_vectors: 512
  
  Additional-model-here:
    ...

Setting Up the CMU Panoptic Dataset

Download the CMU Panoptic dataset following the instructions in the panoptic-toolbox.
Extract the data to your desired path and link it in the configuration file:

root: '/home/gmaldon2/panoptic-toolbox/data/'

Replace the bash files in the panoptic toolbox with the files located in .\panoptic_sh to extract only keypoint data.
The correct files should be generated automatically when you run the code for the first time.

Dataset Structure

The directory structure should look like this:

|-- panoptic-toolbox
    |-- data
        |-- 16060224_haggling1
        |   |-- hdPose3d_stage1_coco19
        |   |-- calibration_160224_haggling1.json
        |-- 160226_haggling1  
        |-- ...

Will take approximately 40 minutes to create new ds file for train.

Getting Started

Clone the repository:

git clone [INSERT LINK HERE]

Install dependencies:

pip install -r requirements.txt

Download CMU Panoptic dataset:

Follow the 'Setting Up the CMU Panoptic Dataset' section

Run the experiments:

python train.py --cfg configs/c2d_2d_vqgan.yml

Citation

If you find our work useful, please consider citing:

@article{maldonado2025adversarially,
  title={Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps},
  author={Maldonado, Gabriel and Rashvand, Narges and Pazho, Armin Danesh and Noghre, Ghazal Alinezhad and Katariya, Vinit and Tabkhi, Hamed},
  journal={arXiv preprint arXiv:2509.19252},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
configs		configs
dataset		dataset
panoptic_sh		panoptic_sh
test_code		test_code
trainer		trainer
utils		utils
vqgan		vqgan
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
output_models.py		output_models.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

Table of Contents

Overview

Model Performance Summary

Training the Model

Configuration Files

Testing the Model

Setting Up the CMU Panoptic Dataset

Dataset Structure

Getting Started

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TeCSAR-UNCC/Pose-Quantization

Folders and files

Latest commit

History

Repository files navigation

Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps

Table of Contents

Overview

Model Performance Summary

Training the Model

Configuration Files

Testing the Model

Setting Up the CMU Panoptic Dataset

Dataset Structure

Getting Started

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages