The goal of this project is to track small toy robots (Hexbugs) in video sequences.
It follows a two-step approach:
- Objects are detected in each frame using Deformable DETR, an object detection model.
- Detected objects are associated across frames using a Kalman filter–based tracker with Hungarian assignment.
The project was implemented by me as part of a master’s seminar.
It has some limitations, which are mostly caused by the combination of the Deformable DETR model and the small size of the Hexbugs in the video frames.
Source of the original video: https://github.com/ankilab/traco_2024
A trained model checkpoint is required to run the pipeline! As no pre-trained model is provided in this repository, you need to provide your own model or use the training functionality (see next section).
-
Clone the repository:
git clone https://github.com/Hypnos8/TRACO.git cd TRACO -
Create a virtual environment and install dependencies:
python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate pip install -r requirements.txt -
Run the tracking pipeline on a video:
python run_pipeline.py(Note: Paths are currently hardcoded in the script; modify them as needed.)
The model training can be started using
python train_model.pyThe training script expects the training and validation datasets to be provided in COCO format. Helper functions for converting annotation files to COCO format are available in the helpers module.
Please note that the training functionality was primarily intended for use during the seminar and currently relies on several hardcoded paths and assumptions. Running training on new datasets therefore requires manual adjustments to the code and configuration.
The implementation is written in Python.
The file run_pipeline.py can be used to run the full pipeline with a trained model on a video file.
The overall tracking process is shown in the figure below:
For object detection, the Deformable DETR implementation from Hugging Face is used and integrated into the pipeline using PyTorch Lightning.
The currently used model is a pre-trained Deformable DETR with a ResNet-50 backbone, trained on the COCO 2017 dataset.
It is loaded via a custom wrapper class to adapt it to the specific use case of Hexbug detection.
The BatchNorm2d layers in the backbone are frozen to avoid issues with small batch sizes during training (caused by memory constraints on the GeForce RTX 3060 GPU).
For training, all annotation files are converted to the COCO format, and a custom PyTorch Dataset class is implemented to load the data.
Data augmentations are applied during training.
Frequently changed hyperparameters and settings are stored in the file const.py.
train_model.pycan be used to train the model locally.run_training_hpc.shis a script for running training jobs on an HPC cluster.- Various helper functions for dataset creation, visualization, and preprocessing are implemented in
helpers.py.
For tracking, a Kalman filter–based tracker with Hungarian assignment is implemented in tracker.py.
The tracker uses Kalman filtering to predict object positions in the next frame and the Hungarian algorithm to associate detections with existing tracks based on a distance metric (IoU or centroid distance).
The Kalman filter implementation is based on the filterpy library.
The best model achieved an
This indicates a reasonable ability to detect Hexbugs; however, a significant number of objects were missed, especially in the test dataset with more challenging scenes.
One possible reason for the model’s limited performance is the difficulty of detecting very small objects.
Since the transformer module operates on feature maps from the ResNet-50 backbone with limited spatial resolution, important features in small regions may be lost.
Additionally, challenges during training (such as limited GPU memory and small batch sizes) made it harder to find an optimal training setup.
