Toolkit for analyzing and comparing Kotlin decompilation and re-Kotlin conversion methods using structural, entropy, and LM-based metrics.
-
Install dependencies:
pip install -r requirements.txt
-
Install JDK and Kotlin compiler (JVM target 23 recommended).
-
Prepare a working environment (GPU recommended for model inference).
-
collect/bytecode/download_datasets.pyDownloadsKExercisesandKStack-cleandatasets.- Output:
originals/directories with Kotlin.ktfiles.
- Output:
-
collect/process_models/compile_models.pyCompiles.ktfiles to bytecode (bytecode/).- Uses
kotlincand fallback Gradle projects if needed. - Logs errors to
compile_errors.log.
- Uses
-
collect/bytecode/bytecode_pair_collector.pyPairs.ktfiles with their disassembled bytecode (javap).- Output:
pairs.jsonlin dataset root.
- Output:
-
collect/bytecode/merge_datasets.pyMerges datasets, splits into train/test JSON files. -
distribution.pyBuilds token and bigram language models from datasets.- Output:
unigram.json,bigram.json,left.json.
- Output:
-
collect/process_models/process_model.pyRuns selected AI model (transformers) to convert bytecode to Kotlin.- Input:
pairs.jsonl. - Output: JSONL file with model outputs per
kt_path.
- Input:
-
collect/process_models/merge_all_jsonl_with_hf.pyMerges original data with all model outputs (JSONL).
-
collect/metrics/metrics_for_models.pyComputes structural, entropy, and LM metrics for all outputs.- Input: merged JSONL and allowed paths JSON.
- Output: CSV file with metrics per model.
-
collect/metrics/metrics_collector.pyProvides methods to compute metrics (structural,entropy,lm_metrics).
-
analysis/tests_J2K.pyCounts successful J2K conversions and compiles for each test. -
analysis/tests_ChatGPT.pySame as above, but for ChatGPT outputs. -
analysis/best_models.pyRanks models by metric distance to original code.
charts/build_charts.pyGenerates bar charts and heatmaps for metrics comparisons.
-
model_train/train.pyFine-tunes models with LoRA on bytecode-to-Kotlin task. -
model_train/find_hyperparameters.pyHyperparameter tuning with Optuna. -
model_train/merge.pyMerges LoRA adapters with base models.
dim_reduction/feature_selection.pyRemoves low-variance and highly correlated metrics.
download_datasets.pycompile_models.pybytecode_pair_collector.pymerge_datasets.pydistribution.pyrun_all.pyto run all modelsmerge_all_jsonl_with_hf.pymetrics_for_models.py- (optional)
best_models.py - (optional)
build_charts.py
- Originals:
dataset/originals/ - Bytecode:
dataset/bytecode/ - Bytecode pairs:
pairs.jsonl - Model outputs:
*.jsonl - Merged metrics:
metrics_results.csv - Charts:
charts/
- GPU is recommended for AI model inference.
- Language models (
unigram,bigram,left) must be built before metrics. - Fine-tuning (
train.py) is optional and requires GPU.