Skip to content

πŸš€ A fully custom Transformer built from scratch in PyTorch that converts human-written pseudocode into real C++ code. Trained on Stanford’s SPoC dataset and deployed on Hugging Face Spaces with an interactive Gradio app for instant code generation.

Notifications You must be signed in to change notification settings

asadsandhu/Pseudo2Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”„ Pseudo2Code – Transformer-based Pseudocode to C++ Converter

License: MIT Python Hugging Face GitHub Repo

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.


πŸ–ΌοΈ Demo

Try it live on Hugging Face Spaces:
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Pseudo2Code

App Demo


🧠 Model Architecture

  • Developed using the Transformer architecture from scratch in PyTorch
  • No pre-trained models (pure from-scratch implementation)
  • Token-level sequence generation using greedy decoding
  • Custom vocabulary construction for both pseudocode and C++ output

Input:   Pseudocode lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  C++ code line for each pseudocode line


πŸ“Š Dataset

We used the SPoC dataset from Stanford:

  • βœ… Clean pseudocode–C++ line pairs
  • βœ… Token-level annotations for syntax handling
  • βœ… Multiple test splits (generalization to problems/workers)
  • βœ… Custom preprocessing and vocabulary building implemented

πŸ“Ž Licensed under CC BY 4.0


πŸ“ Directory Structure


.
β”œβ”€β”€ app.py                # Gradio web app for inference
β”œβ”€β”€ train.py              # Transformer training code
β”œβ”€β”€ model.pth             # Trained model weights
β”œβ”€β”€ spoc/                 # Dataset directory
β”‚   └── train/
β”‚       β”œβ”€β”€ spoc-train.tsv
β”‚       └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚   └── demo.png          # App screenshot
└── README.md             # You're here


πŸ› οΈ How to Run Locally

βš™οΈ 1. Clone Repo & Install Requirements

git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txt

Or manually install:

pip install torch gradio tqdm

πŸš€ 2. Launch the App

Make sure model.pth is present (or train using train.py):

python app.py

The app will open in your browser.


πŸ§ͺ Training the Model

You can retrain the model using the train.py script:

python train.py

By default, it downloads data from the public repo and trains for 10 epochs. Outputs a model.pth file with learned weights and vocab.


πŸ”§ Key Hyperparameters

Parameter Value
Model Type Transformer
Max Length 128
Embedding Dim 256
FFN Dim 512
Heads 4
Encoder Layers 2
Decoder Layers 2
Batch Size 64
Epochs 10
Optimizer Adam
Learning Rate 1e-4

🧩 Example Input

n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o

⏩ Output C++

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

πŸ“¦ Deployment

This app is deployed live on:


πŸ™Œ Acknowledgements


πŸ§‘β€πŸ’» Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali


πŸ“„ License

This project is licensed under the MIT License. Feel free to use, modify, and share with credit.

About

πŸš€ A fully custom Transformer built from scratch in PyTorch that converts human-written pseudocode into real C++ code. Trained on Stanford’s SPoC dataset and deployed on Hugging Face Spaces with an interactive Gradio app for instant code generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published