Demonstrating the object detection using DETR model
Key points:
- Backbone: A convolutional neural network (CNN) backbone, ResNet.
- Transformer Encoder: which learns global dependencies between the different image patches.
- Transformer Decoder: The decoder is trained to directly predict the final set of objects in parallel, without the need for techniques like anchor boxes or non-maximum suppression.
- Object Queries: DETR introduces a set of learnable positional embeddings called "object queries" that are used by the decoder to predict the bounding boxes and classes for each potential object in the image.
