KotlinDecompiler

Toolkit for analyzing and comparing Kotlin decompilation and re-Kotlin conversion methods using structural, entropy, and LM-based metrics.

Setup

Install dependencies:
```
pip install -r requirements.txt
```
Install JDK and Kotlin compiler (JVM target 23 recommended).
Prepare a working environment (GPU recommended for model inference).

Repository Structure & Scripts

Data Collection

collect/bytecode/download_datasets.py Downloads KExercises and KStack-clean datasets.
- Output: originals/ directories with Kotlin .kt files.
collect/process_models/compile_models.py Compiles .kt files to bytecode (bytecode/).
- Uses kotlinc and fallback Gradle projects if needed.
- Logs errors to compile_errors.log.
collect/bytecode/bytecode_pair_collector.py Pairs .kt files with their disassembled bytecode (javap).
- Output: pairs.jsonl in dataset root.
collect/bytecode/merge_datasets.py Merges datasets, splits into train/test JSON files.
distribution.py Builds token and bigram language models from datasets.
- Output: unigram.json, bigram.json, left.json.

Model Inference

collect/process_models/process_model.py Runs selected AI model (transformers) to convert bytecode to Kotlin.
- Input: pairs.jsonl.
- Output: JSONL file with model outputs per kt_path.
collect/process_models/merge_all_jsonl_with_hf.py Merges original data with all model outputs (JSONL).

Metrics Computation

collect/metrics/metrics_for_models.py Computes structural, entropy, and LM metrics for all outputs.
- Input: merged JSONL and allowed paths JSON.
- Output: CSV file with metrics per model.
collect/metrics/metrics_collector.py Provides methods to compute metrics (structural, entropy, lm_metrics).

Analysis

analysis/tests_J2K.py Counts successful J2K conversions and compiles for each test.
analysis/tests_ChatGPT.py Same as above, but for ChatGPT outputs.
analysis/best_models.py Ranks models by metric distance to original code.

Visualization

charts/build_charts.py Generates bar charts and heatmaps for metrics comparisons.

Model Training

model_train/train.py Fine-tunes models with LoRA on bytecode-to-Kotlin task.
model_train/find_hyperparameters.py Hyperparameter tuning with Optuna.
model_train/merge.py Merges LoRA adapters with base models.

Utilities

dim_reduction/feature_selection.py Removes low-variance and highly correlated metrics.

Recommended Pipeline

download_datasets.py
compile_models.py
bytecode_pair_collector.py
merge_datasets.py
distribution.py
run_all.py to run all models
merge_all_jsonl_with_hf.py
metrics_for_models.py
(optional) best_models.py
(optional) build_charts.py

Input/Output Locations

Originals: dataset/originals/
Bytecode: dataset/bytecode/
Bytecode pairs: pairs.jsonl
Model outputs: *.jsonl
Merged metrics: metrics_results.csv
Charts: charts/

Notes

GPU is recommended for AI model inference.
Language models (unigram, bigram, left) must be built before metrics.
Fine-tuning (train.py) is optional and requires GPU.

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
.github/workflows		.github/workflows
models		models
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
best_models.csv		best_models.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KotlinDecompiler

Setup

Repository Structure & Scripts

Data Collection

Model Inference

Metrics Computation

Analysis

Visualization

Model Training

Utilities

Recommended Pipeline

Input/Output Locations

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KotlinDecompiler

Setup

Repository Structure & Scripts

Data Collection

Model Inference

Metrics Computation

Analysis

Visualization

Model Training

Utilities

Recommended Pipeline

Input/Output Locations

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages