CS231N
Overview This project implements a self-supervised learning approach to understand and model local folding patterns in small protein fragments (30-50 amino acids). By treating protein distance maps as images and applying a masked autoencoder approach, we learn the underlying principles of protein structure without relying on labeled data. Our model uses a Vision Transformer (ViT) architecture that learns to reconstruct masked portions of protein distance maps, forcing it to understand the complex spatial relationships and constraints that govern protein folding.
Approach Self-Supervised Learning Strategy Rather than using labeled data, we employ a self-supervised approach where the model learns to predict masked regions of distance maps from visible regions. This approach has several advantages:
It doesn't require manual annotation It can leverage large amounts of unlabeled protein structure data It encourages the model to learn intrinsic properties of protein folding
Data Representation Protein structures are represented as distance maps, where each pixel (i,j) corresponds to the distance between alpha carbon atoms of residues i and j. These distance maps:
Capture the complete 3D structure in a 2D format Show characteristic patterns associated with secondary structures Reveal long-range interactions critical for folding
Vision Transformer Architecture We implement a ViT-based masked autoencoder that:
Divides the distance map into patches Randomly masks a high percentage (75%) of these patches Processes visible patches through an encoder Reconstructs the full distance map using a decoder Learns to predict the masked regions