Skip to content

Tob1n8tor/code-translator

Repository files navigation

Code Translation

Web Application for translating code from one language (e.g Java) to another language (e.g. Python).

Description

Project Overview

This project aims to fine-tune existing models using the Hugging Face library for code translation tasks. The goal is to evaluate different models and datasets, adjusting hyperparameters to find the best-performing combination. The trained models, data, and results are stored in the model_training directory, which also includes an Excel file for model comparison. Additionally, a detailed research paper explaining the methodology and findings is included.

Project structure

\kabul
|
├──  /backend                               # Django backend code
|    ├── /api                               # Django REST API and used model for translation           
|    ├── /backend                           # Django project settings  
|    ├── Dockerfile                         # Dockerfile for Django backend
|    ├── manage.py                          # Django project entry point
|    └── requirements.txt                   # Python dependencies
|
├── /frontend                               # React frontend code    
|    ├── /public                            # Public assets
|    ├── /src                               # React components including App.js and App.css for the UI
|    ├── Dockerfile                         # Dockerfile for React frontend
|    ├── firebase.json                      # Firebase configuration for hosting and rewrites 
|    ├── package-log.json                   # Auto-generated lock file for dependencies, ensures consistent installs
|    └── package.json                       # Project metadata and dependencies for the React frontend
|
├── /model_training                         # Trained models, data, and results
|    ├── /benchmarking                      # Model benchmarking results and code used for evaluation    
|    └── /training_codes                    # Python notebooks for model training
|  
├── /report                                 # Folder including final project paper
|    └── Kabul_Code_Translation_Report.pdf
|    └── Kabul_Code_Translation_Report.tex
|    
├── docker-compose.yml                      # Docker Compose configuration for local setup
└── README.md                               # This README file

Getting started

Prerequisites

Installation Steps

  1. Clone the repository

    git clone https://gitlab.lrz.de/bpc-ws-2425/kabul.git
    cd kabul
  2. Build and start the containers using Docker Compose

    docker-compose --profile dev up

    This command will build and start both the React frontend and Django backend containers. It will also configure networking between the two services.

  3. Access the Application

    Frontend (React): Open your browser and go to http://localhost:3000

Online Access

Our application is currently deployed online and can be accessed directly without requiring local setup. Visit the following link to explore the code translation functionality: https://code-translation.com

Model Fine-Tuning & Benchmarking

This project uses Hugging Face's transformers library to fine-tune pre-trained models for code translation tasks. The goal is to benchmark various models with different hyperparameters and datasets to identify the most optimal configuration.

Training

The model training code is included in the model_training/training_codes directory. The training process involves loading the pre-trained model, preparing and tokenizing the training data, and fine-tuning the model on the target dataset. The training code is written in Python and uses the Hugging Face transformers library.

Fine-Tuned Models

We fine-tuned following base models for code translation tasks:

Datasets

For the model fine-tuning process, we use the following datasets:

The custom datasets are stored in model_training/training_codes/datasets

Benchmarking

For benchmarking, we evaluate the performance of different fine-tuned models using the ROUGE score, TER score, BERTScore and Frugal score. The benchmarking results are stored in the form of an Excel file inside the model_training/benchmarking/excel_result_files directory. We benchmarked the models using different test sets. We had one test set for each translation task (e.g. Java to Python, Python to Java, etc.) as well as a combined test set that included all translation tasks. For easier comparison, the results are also visualized using differnts charts in the Excel file. The benchmarking code is included in the model_training/benchmarking directory.

Research Paper

A detailed report explaining the methodology, model evaluation, and results is included in the /report directory. This report provides insights into the model selection, fine-tuning process and model evaluation.

Troubleshooting:

  • when you run into issues like react-script not found try to run npm install in the /frontend folder from terminal first and the try the docker-compose command again

About

Web Application for code-to-code generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •