Skip to content

BackSideAttack/231N

Repository files navigation

231N

CS231N

Overview This project implements a self-supervised learning approach to understand and model local folding patterns in small protein fragments (30-50 amino acids). By treating protein distance maps as images and applying a masked autoencoder approach, we learn the underlying principles of protein structure without relying on labeled data. Our model uses a Vision Transformer (ViT) architecture that learns to reconstruct masked portions of protein distance maps, forcing it to understand the complex spatial relationships and constraints that govern protein folding.

Approach Self-Supervised Learning Strategy Rather than using labeled data, we employ a self-supervised approach where the model learns to predict masked regions of distance maps from visible regions. This approach has several advantages:

It doesn't require manual annotation It can leverage large amounts of unlabeled protein structure data It encourages the model to learn intrinsic properties of protein folding

Data Representation Protein structures are represented as distance maps, where each pixel (i,j) corresponds to the distance between alpha carbon atoms of residues i and j. These distance maps:

Capture the complete 3D structure in a 2D format Show characteristic patterns associated with secondary structures Reveal long-range interactions critical for folding

Vision Transformer Architecture We implement a ViT-based masked autoencoder that:

Divides the distance map into patches Randomly masks a high percentage (75%) of these patches Processes visible patches through an encoder Reconstructs the full distance map using a decoder Learns to predict the masked regions

About

CS231N

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages