Design of a transformer-based architecture for object detection conditioned by metadata:
- DEtection TRanformer (DETR)
- You Only Look at One Sequence (YOLOS)
To install the project, simply clone the repository and get the necessary dependencies. Then, create a new project on Weights & Biases. Log in and paste your API key when prompted.
# clone repo
git clone https://github.com/MarcoParola/conditioning-transformer.git
cd conditioning-transformer
mkdir models data
# Create virtual environment and install dependencies
python -m venv env
. env/bin/activate
python -m pip install -r requirements.txt
# Weights&Biases login
wandb login To perform a training run by setting model parameter that can assume the following value detr, early-sum-detr, early-concat-detr, yolos, early-sum-yolos, early-concat-yolos
python train.py model=detrThe command could also be run specifying the cropBackground option by setting it at true or false resulting on the following training image.
| Whole image | Cropped image |
|---|---|
![]() |
![]() |
To run inference on test set to compute some metrics, specify the weight model path by setting weight parameter (I ususally download it from wandb and I copy it in checkpoint folder).
python test.py model=detr weight=checkpoint/best.ptTraining hyperparams can be edited in the config file or ovewrite by shell
| Params | Value |
|---|---|
| batchSize | 16 |
| lr | 1e-6 |
Special thanks to @clive819 for making an implementation of DETR public here. Special thanks to @hustvl for YOLOS original implementation

