Skip to content

This repository contains code and dataset setup instructions for Shopformer.

License

Notifications You must be signed in to change notification settings

TeCSAR-UNCC/Shopformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose

Overview

This repository contains the official implementation of Shopformer (CVPR 2025). Shopformer is a novel transformer-based framework designed for detecting shoplifting behaviors using only human pose data, , rather than raw pixel information. Unlike traditional video-based methods, Shopformer focuses on privacy-preserving, pose-based shoplifting detection. The model introduces a two-stage architecture: A Graph Convolutional Autoencoder (GCAE) learns rich spatio-temporal embeddings from human pose sequences. These embeddings are tokenized and passed through a transformer encoder-decoder, which reconstructs the sequence. The reconstruction error is then used to compute a normality score for shoplifting detection.

Key Features

  • GCAE-based tokenization of human pose sequences
  • Transformer encoder-decoder with attention for behavior modeling
  • Evaluated on PoseLift dataset (real-world shoplifting pose data)
  • Privacy-preserving and real-time capable

Shopformer Architecture

The following figure illustrates the overall architecture of the Shopformer model:

Shopformer Architecture

Figure 1: Overview of the Shopformer architecture. The framework operates in two stages: (1) a Graph Convolutional Autoencoder is first trained on pose sequences to learn rich spatio-temporal representations; (2) the pretrained encoder is then repurposed as a tokenizer module, generating compact tokens from input pose data. These tokens are passed through a transformer encoder-decoder module, which reconstructs the input sequence. The reconstruction error (MSE loss) is used to compute the normality score for shoplifting detection.

Project Structure

  • models/ – GCAE tokenizer & transformer model
  • scripts/ – training and evaluation scripts
  • data/ – instructions and expected format for PoseLift dataset
  • config/ – training configurations
  • utils/ – metric calculations, pose preprocessing

Dataset

This model is trained on the PoseLift dataset. You can access to the dataset nad related documentation here: PoseLift GitHub Repository

After downloading the dataset, organize the files into the following directory structure:

DATA/
└── Poselift/
   ├── gt/
   │   └── test_frame_mask/
   │       └── (test set frame-level binary mask files indicating normal or shoplifting behavior)
   └── pose/
       ├── train/
       │   └── (training pose JSON files)
       └── test/
           └── (test pose JSON files)

Installation

conda env create -f environment.yml

conda activate Shopformer

Training

Stage 1: Train the tokenizer using the following command, or use the pre-trained tokenizer

Stage 2: Freeze the encoder and train transformer: python3 main_to.py --dataset Poselift --model_optimizer adam --mask_root ...
--seg_len 24 --seg_stride 12 --num_kp 18 --model_num_heads 12 --model_num_layers 4
--epochs 10 --dropout 0.1 --model_lr 5e-05 --model_save_dir ... --model_loss mse --token_config graph --model_latent_dim 64

Results

Shopformer generates 2 tokens per pose sequence, as this setup achieved the best trade-off between accuracy and computational efficiency during ablation studies. Each token has an embedding size of 144, encoded using 8 channels over 18 keypoints. For detailed results comparing token counts ranging from 2 to 12, please refer to the ablation study section in the paper.

Table 1: AUC-ROC, AUC-PR, and EER of Shopformer compared with state-of-the-art pose-based anomaly detection models on the PoseLift dataset.

Methods AUC-ROC AUC-PR EER
STG-NF 67.46 84.06 0.39
TSGAD 63.35 39.31 0.41
GEPC 60.61 50.38 0.38
Shopformer 69.15 44.49 0.38

Citation

If you find our work useful, please consider citing:

@article{rashvand2025shopformer,
  title={Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose},
  author={Rashvand, Narges and Noghre, Ghazal Alinezhad and Pazho, Armin Danesh and Ardabili, Babak Rahimi and Tabkhi, Hamed},
  journal={arXiv preprint arXiv:2504.19970},
  year={2025}
}

Contact

If you have any questions or need assistance, please contact the authors at nrashvan@charlotte.edu.

About

This repository contains code and dataset setup instructions for Shopformer.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published