A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the SPoC dataset from Stanford.
Try it live on Hugging Face Spaces:
π https://huggingface.co/spaces/asadsandhu/Pseudo2Code
- Developed using the Transformer architecture from scratch in PyTorch
- No pre-trained models (pure from-scratch implementation)
- Token-level sequence generation using greedy decoding
- Custom vocabulary construction for both pseudocode and C++ output
Input: Pseudocode lines (line-by-line)
Model: Transformer (Encoder-Decoder)
Output: C++ code line for each pseudocode line
We used the SPoC dataset from Stanford:
- β Clean pseudocodeβC++ line pairs
- β Token-level annotations for syntax handling
- β Multiple test splits (generalization to problems/workers)
- β Custom preprocessing and vocabulary building implemented
π Licensed under CC BY 4.0
.
βββ app.py # Gradio web app for inference
βββ train.py # Transformer training code
βββ model.pth # Trained model weights
βββ spoc/ # Dataset directory
β βββ train/
β βββ spoc-train.tsv
β βββ split/spoc-train-eval.tsv
βββ assets/
β βββ demo.png # App screenshot
βββ README.md # You're here
git clone https://github.com/asadsandhu/Pseudo2Code.git
cd Pseudo2Code
pip install -r requirements.txtOr manually install:
pip install torch gradio tqdmMake sure model.pth is present (or train using train.py):
python app.pyThe app will open in your browser.
You can retrain the model using the train.py script:
python train.pyBy default, it downloads data from the public repo and trains for 10 epochs.
Outputs a model.pth file with learned weights and vocab.
| Parameter | Value |
|---|---|
| Model Type | Transformer |
| Max Length | 128 |
| Embedding Dim | 256 |
| FFN Dim | 512 |
| Heads | 4 |
| Encoder Layers | 2 |
| Decoder Layers | 2 |
| Batch Size | 64 |
| Epochs | 10 |
| Optimizer | Adam |
| Learning Rate | 1e-4 |
n , nn, ans = integers with ans =0
Read n
for i=2 to n-1 execute
set nn to n
while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i
}
set o to gcd(ans, n-2)
print out ans/o "/" (n-2)/o
int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}This app is deployed live on:
- Hugging Face Spaces: Pseudo2Code
- GitHub: github.com/asadsandhu/Pseudo2Code
-
π SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
-
π§ Transformer Paper: "Attention is All You Need"
Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali
This project is licensed under the MIT License. Feel free to use, modify, and share with credit.
