The paper lists of recent computer vision architecture. we deal with the cutting-edge network of computer vision, starting with the contents of ViT and the difference between CNN and ViT.
-
ViT: "An Image is Worth 16x16 Words: Transformers for Image Recognetion at Scale", ICLR, 2021 [paper] [code] [summary]
-
ViT vs CNN: "Do Vision Transformers See Like Convolutional Neural Networks?", NeurIPS, 2021 [paper] [code] [summary]
-
How to train ViT: "How to train your ViT? Data, Augmentation,and Regularization in Vision Transformers", Arxiv 2021 [paper] [code] [summary]
-
Swin Transformer: "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows", ICCV, 2021 [paper] [code] [summary]
-
Swin Transformer V2: "Swin Transformer V2: Scaling Up Capacity and Resolution", CVPR, 2022 [paper] [code] [summary]
-
VOLO: "VOLO: Vision Outlooker for Visual Recognition", TPAMI(early access), 2022 [paper] [code] [summary]