Skip to content

An experimental GPT-style language model built from scratch in PyTorch to generate novel, heat-stable thermophilic protein sequences.

Notifications You must be signed in to change notification settings

jsartori12/Pyroformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pyroformer 🔥

PyroGPT

An experimental GPT-style language model built from scratch in PyTorch to generate novel, heat-stable thermophilic protein sequences.

✨ Features

  • Decoder-only Transformer architecture.
  • Custom Byte-Pair Encoding (BPE) tokenizer trained on a curated dataset of thermophilic proteins.
  • Simple, educational codebase for training and generation.

🚀 Usage

  1. Prepare the dataset and place it in the root folder.

  2. Train the BPE tokenizer:

    python bpe_vocab.py
  3. Train the generative model:

    python main.py

    The script will print generated sequences upon completion.

Acknowledgements

This project is heavily inspired by Andrej Karpathy's educational work on nanoGPT.

About

An experimental GPT-style language model built from scratch in PyTorch to generate novel, heat-stable thermophilic protein sequences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages