This repository contains my personal work on Direct Preference Optimization (DPO) for the final project in Modern Natural Language Processing (CS552) at EPFL, Spring 2025.
📄 FINALREPORT.pdf – Revised version of the final report (Revised independently to better reflect my contributions — shared here for archival purposes only.)
📝 proposal.pdf – Initial project proposal outlining the goals, plan, and methodology.
📈 progress_report.pdf – Mid-project progress report describing partial results and individual contributions.
The goal of this project was to implement and evaluate DPO based on Qwen3-0.6B-Base, using diverse pairwise preference data.
This repo includes:
- DPO, pre-DPO, and SFT training scripts
- Dataset analysis and preprocessing utilities
- Evaluation modules (LightEval-compatible)
- Configs and helper scripts
-
Milestone 3 – Trained DPO Models
👉 View Collection on Hugging Face -
Evaluation Dataset (LightEval-compatible)
👉 View Dataset Collection
↪ Compatible with LightEval -
Milestone 2 – Baseline Models
👉 View Baseline Models
Install dependencies:
pip install transformers peft trl acceleratepython train_sft.py \
--config_path <path_to_config_json> # e.g., MNLP/config/sft_base.json \
--sft_dataset_name <sft_dataset_name> # e.g., koreankiwi99/mnlp_stem_curriculumpython train_dpo.py \
--config_path <path_to_config_json> # e.g., MNLP/config/lower_beta.json \
--hf_dataset_name <dataset_name> # e.g., koreankiwi99/mnlp_aggregate \
--model_name <base_or_sft_model> # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_pluspython train_predpo.py \
--config_path <path_to_config_json> # e.g., MNLP/config/predpo_lower_beta.json \
--hf_dataset_name <dataset_name> # e.g., koreankiwi99/mnlp_aggregate \
--model_name <base_or_sft_model> # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced \
--ref_model_name <dpo_or_simpo_model> # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_lower_beta_mnlp_aggregate.
├── config/ # Training configs
├── data/ # Subsets of EPFL Dataset
├── evaluation/ # Convert public benchmark to LightEval format
├── legacy/ # Deprecated or testing code
├── preprocessing/ # Combining SFT datasets
├── check_overlap.py # Utility for overlap analysis
├── dataset_stats.py # Dataset statistics and diagnostics
├── train_dpo.py # Main DPO training script
├── train_predpo.py # Pre-DPO preference modeling
├── train_sft.py # Supervised fine-tuning script
├── FINALREPORT.pdf
├── proposal.pdf
├── progress_report.pdf
└── README.md
Kyuhee Kim Master’s in Data Science @ EPFL 🌐 GitHub | 🤗 Hugging Face