Hi there, thank you so much for putting this repository together for this implementation, it's very interesting!
I'm working on implementing this with a custom COCO instances formatted dataset rather than the original COCO 2017 instances dataset. I did an initial test run using the original COCO dataset, and was able to see the validation segm AP results gradually begin to increase as expected in as little as 500 iterations with a batch size of 2 for a quick test:
python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacher
My problems arise when I begin integrating my custom dataset. I am able to successfully register my training/test set using register_coco_instances from data.datasets > coco.py. I then update the configuration accordingly:
cfg.DATASETS.TRAIN = ("custom_train",)
cfg.DATASETS.TEST = ("custom_test",)
Inside the coco_unlabel folder, I create the symlinks for the images folder pointing to my training images folder and the symlink for the val2017 folder to my validation set as per the instructions. I point DETECTRON2_DATASETS to the location where coco_unlabel lives, and it appears to pick it up.
Up to here, everything works fine. The training job starts using:
python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacher
When the training job attempts to do the first evaluation step (set to 500 for testing), an error shows explaining my test set doesn't appear to be registeredm even though it picked up the training set:
[03/02 22:16:41 d2.utils.events]: eta: 2 days, 13:54:06 iter: 499 total_loss: 50.87 loss_ce: 0.1988 loss_mask: 1.255 loss_dice: 3.667 loss_ce_0: 1.005 loss_mask_0: 0.8838 loss_dice_0: 3.57 loss_ce_1: 0.1726 loss_mask_1: 1.169 loss_dice_1: 3.563 loss_ce_2: 0.1709 loss_mask_2: 1.215 loss_dice_2: 3.544 loss_ce_3: 0.1839 loss_mask_3: 1.165 loss_dice_3: 3.657 loss_ce_4: 0.1798 loss_mask_4: 1.212 loss_dice_4: 3.613 loss_ce_5: 0.2062 loss_mask_5: 1.233 loss_dice_5: 3.729 loss_ce_6: 0.2123 loss_mask_6: 1.267 loss_dice_6: 3.744 loss_ce_7: 0.2188 loss_mask_7: 1.259 loss_dice_7: 3.683 loss_ce_8: 0.1927 loss_mask_8: 1.263 loss_dice_8: 3.703 time: 0.6120 last_time: 0.6109 data_time: 0.0064 last_data_time: 0.0057 lr: 0.0001 max_mem: 10689M
Traceback (most recent call last):
File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/catalog.py", line 51, in get
f = self[name]
File "/usr/lib/python3.10/collections/__init__.py", line 1106, in __getitem__
raise KeyError(key)
KeyError: 'custom_test'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/b/GuidedDistillation/train_net.py", line 470, in <module>
launch(
File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/launch.py", line 84, in launch
main_func(*args)
File "/home/b/GuidedDistillation/train_net.py", line 464, in main
return trainer.train()
File "/home/b/GuidedDistillation/modules/defaults.py", line 566, in train
super().train(self.start_iter, self.max_iter)
File "/home/b/GuidedDistillation/modules/train_loop.py", line 165, in train
self.after_step()
File "/home/b/GuidedDistillation/modules/train_loop.py", line 199, in after_step
h.after_step()
File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/hooks.py", line 556, in after_step
self._do_eval()
File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/hooks.py", line 529, in _do_eval
results = self._func()
File "/home/b/GuidedDistillation/modules/defaults.py", line 525, in test_and_save_results
self._last_eval_results = self.test(self.cfg, self.model)
File "/home/b/GuidedDistillation/modules/defaults.py", line 691, in test
evaluator = cls.build_evaluator(cfg, dataset_name)
File "/home/b/GuidedDistillation/train_net.py", line 115, in build_evaluator
evaluator_list.append(COCOEvaluator(dataset_name, output_dir=output_folder))
File "/home/b/.local/lib/python3.10/site-packages/detectron2/evaluation/coco_evaluation.py", line 142, in __init__
convert_to_coco_json(dataset_name, cache_path, allow_cached=allow_cached_coco)
File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/datasets/coco.py", line 511, in convert_to_coco_json
coco_dict = convert_to_coco_dict(dataset_name)
File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/datasets/coco.py", line 354, in convert_to_coco_dict
dataset_dicts = DatasetCatalog.get(dataset_name)
File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/catalog.py", line 53, in get
raise KeyError(
KeyError: "Dataset 'custom_test' is not registered!
If I register the test set with detectron2.data.datasets instead of data.datasets, the evaluation works, but the AP values are always 0 no matter how long the job runs:
[03/02 22:23:40 d2.utils.events]: eta: 2 days, 11:02:34 iter: 479 total_loss: 51.48 loss_ce: 0.2197 loss_mask: 0.9924 loss_dice: 3.769 loss_ce_0: 1.136 loss_mask_0: 0.8419 loss_dice_0: 3.583 loss_ce_1: 0.2115 loss_mask_1: 1.055 loss_dice_1: 3.583 loss_ce_2: 0.1991 loss_mask_2: 1.087 loss_dice_2: 3.628 loss_ce_3: 0.2439 loss_mask_3: 1.014 loss_dice_3: 3.63 loss_ce_4: 0.2733 loss_mask_4: 0.9731 loss_dice_4: 3.611 loss_ce_5: 0.2954 loss_mask_5: 0.9499 loss_dice_5: 3.639 loss_ce_6: 0.2749 loss_mask_6: 1.042 loss_dice_6: 3.646 loss_ce_7: 0.2482 loss_mask_7: 0.9416 loss_dice_7: 3.682 loss_ce_8: 0.2521 loss_mask_8: 1.016 loss_dice_8: 3.729 time: 0.5927 last_time: 0.5788 data_time: 0.0061 last_data_time: 0.0044 lr: 0.0001 max_mem: 10690M
[03/02 22:23:52 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[03/02 22:23:52 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[03/02 22:23:52 d2.data.common]: Serializing 74 elements to byte tensors and concatenating them all ...
[03/02 22:23:52 d2.data.common]: Serialized dataset takes 0.05 MiB
[03/02 22:23:52 d2.evaluation.evaluator]: Start inference on 74 batches
[03/02 22:23:54 d2.evaluation.evaluator]: Inference done 11/74. Dataloading: 0.0010 s/iter. Inference: 0.1043 s/iter. Eval: 0.0543 s/iter. Total: 0.1596 s/iter. ETA=0:00:10
[03/02 22:23:59 d2.evaluation.evaluator]: Inference done 44/74. Dataloading: 0.0010 s/iter. Inference: 0.1034 s/iter. Eval: 0.0521 s/iter. Total: 0.1565 s/iter. ETA=0:00:04
[03/02 22:24:04 d2.evaluation.evaluator]: Total inference time: 0:00:11.024099 (0.159770 s / iter per device, on 1 devices)
[03/02 22:24:04 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:07 (0.106315 s / iter per device, on 1 devices)
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Saving results to ./output-teacher/inference/coco_instances_results.json
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.00 seconds.
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluation results for bbox:
| AP | AP50 | AP75 | APs | APm | APl |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Loading and preparing results...
DONE (t=0.06s)
creating index...
index created!
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Evaluate annotation type *segm*
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.01 seconds.
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluation results for segm:
| AP | AP50 | AP75 | APs | APm | APl |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
[03/02 22:24:04 d2.evaluation.testing]: copypaste: Task: bbox
[03/02 22:24:04 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/02 22:24:04 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
[03/02 22:24:04 d2.evaluation.testing]: copypaste: Task: segm
[03/02 22:24:04 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/02 22:24:04 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
Am I missing something here? I'm assuming its related to registering my datasets, as the original COCO dataset implementation from the guide appears to work. I've also made sure to update the NUM_CLASSES field across the config according to the classes available in my custom dataset. I've also tried the Dino/R50 bases as well with no luck. Thank you!
Hi there, thank you so much for putting this repository together for this implementation, it's very interesting!
I'm working on implementing this with a custom COCO instances formatted dataset rather than the original COCO 2017 instances dataset. I did an initial test run using the original COCO dataset, and was able to see the validation segm AP results gradually begin to increase as expected in as little as 500 iterations with a batch size of 2 for a quick test:
python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacherMy problems arise when I begin integrating my custom dataset. I am able to successfully register my training/test set using
register_coco_instancesfromdata.datasets>coco.py. I then update the configuration accordingly:Inside the
coco_unlabelfolder, I create the symlinks for theimagesfolder pointing to my training images folder and the symlink for theval2017folder to my validation set as per the instructions. I pointDETECTRON2_DATASETSto the location wherecoco_unlabellives, and it appears to pick it up.Up to here, everything works fine. The training job starts using:
python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacherWhen the training job attempts to do the first evaluation step (set to 500 for testing), an error shows explaining my test set doesn't appear to be registeredm even though it picked up the training set:
If I register the test set with
detectron2.data.datasetsinstead ofdata.datasets, the evaluation works, but the AP values are always 0 no matter how long the job runs:Am I missing something here? I'm assuming its related to registering my datasets, as the original COCO dataset implementation from the guide appears to work. I've also made sure to update the
NUM_CLASSESfield across the config according to the classes available in my custom dataset. I've also tried the Dino/R50 bases as well with no luck. Thank you!