Vision Transformer

Implementation based on this paper. See transformer.ipynb for implementation.

Results

We realised that the better models were obtained be means of smaller parameters. That, combined with data augmentation, yielded better results. The best model configuration turned out to be the following.

cfg = {
    # Architecture
    'depth': 6,
    'dropout': 0.1,          
    'mlp_ratio': 4,
    'num_patches': 8,        
    'embed_dim': 192,        

    # Optimization
    'lr': 1e-3,              
    'weight_decay': 0.05,    
}

combined with basic data augmentation:

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4, padding_mode='reflect'),
    transforms.RandomHorizontalFlip(),
    transforms.RandomApply([  # 50% chance to apply each
        transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.02),
    ], p=0.5),
    transforms.RandomApply([
        transforms.GaussianBlur(kernel_size=3, sigma=(0.1, 0.5)),
    ], p=0.3),
    transforms.ToTensor(),
    transforms.Normalize(CIFAR_MEAN, CIFAR_STD),
    transforms.RandomErasing(p=0.2, scale=(0.02, 0.1)),  # Very mild erasing
])

performed the best. These are basic augmentations applied randomly to the dataset.

More agressive augmentation did not work any better.

The best model is in best_model.pth.

Embeddings

We see that 2d and sinusoidal embeddings provide marginal improvements over 1d learned embeddings. None, is, however, significantly worse than the others.

Attention Maps

Visualisation attached:

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
best_model.pth		best_model.pth
final.ipynb		final.ipynb
image-2.png		image-2.png
image.png		image.png
patch size.png		patch size.png
pos_enc.png		pos_enc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision Transformer

Results

Embeddings

Attention Maps

About

Uh oh!

Releases

Packages

Languages

aadiprasad/vision-transformer

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer

Results

Embeddings

Attention Maps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages