DPO – CS552 Modern NLP (Spring 2025, EPFL)

This repository contains my personal work on Direct Preference Optimization (DPO) for the final project in Modern Natural Language Processing (CS552) at EPFL, Spring 2025.

📄 FINALREPORT.pdf – Revised version of the final report (Revised independently to better reflect my contributions — shared here for archival purposes only.)

📝 proposal.pdf – Initial project proposal outlining the goals, plan, and methodology.

📈 progress_report.pdf – Mid-project progress report describing partial results and individual contributions.

🔍 Overview

The goal of this project was to implement and evaluate DPO based on Qwen3-0.6B-Base, using diverse pairwise preference data.

This repo includes:

DPO, pre-DPO, and SFT training scripts
Dataset analysis and preprocessing utilities
Evaluation modules (LightEval-compatible)
Configs and helper scripts

🤗 Hugging Face Collections

Milestone 3 – Trained DPO Models
👉 View Collection on Hugging Face
Evaluation Dataset (LightEval-compatible)
👉 View Dataset Collection
↪ Compatible with LightEval
Milestone 2 – Baseline Models
👉 View Baseline Models

🚀 Quick Start (Training Scripts)

Install dependencies:

pip install transformers peft trl accelerate

🔧 Run SFT Training

python train_sft.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/sft_base.json \
  --sft_dataset_name <sft_dataset_name>       # e.g., koreankiwi99/mnlp_stem_curriculum

🔧 Run DPO Training

python train_dpo.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/lower_beta.json \
  --hf_dataset_name <dataset_name>            # e.g., koreankiwi99/mnlp_aggregate \
  --model_name <base_or_sft_model>            # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_plus

🔧 Run Pre-DPO Preference Model Training

python train_predpo.py \
  --config_path <path_to_config_json>         # e.g., MNLP/config/predpo_lower_beta.json \
  --hf_dataset_name <dataset_name>            # e.g., koreankiwi99/mnlp_aggregate \
  --model_name <base_or_sft_model>            # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced \
  --ref_model_name <dpo_or_simpo_model>       # e.g., koreankiwi99/sft_model_sft_base_mnlp_stem_balanced_lower_beta_mnlp_aggregate

📁 Repo Structure

.
├── config/              # Training configs
├── data/                # Subsets of EPFL Dataset
├── evaluation/          # Convert public benchmark to LightEval format
├── legacy/              # Deprecated or testing code
├── preprocessing/       # Combining SFT datasets
├── check_overlap.py     # Utility for overlap analysis
├── dataset_stats.py     # Dataset statistics and diagnostics
├── train_dpo.py         # Main DPO training script
├── train_predpo.py      # Pre-DPO preference modeling
├── train_sft.py         # Supervised fine-tuning script
├── FINALREPORT.pdf
├── proposal.pdf
├── progress_report.pdf
└── README.md

👩🏻‍💻 About

Kyuhee Kim Master’s in Data Science @ EPFL 🌐 GitHub | 🤗 Hugging Face

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DPO – CS552 Modern NLP (Spring 2025, EPFL)

🔍 Overview

🤗 Hugging Face Collections

🚀 Quick Start (Training Scripts)

🔧 Run SFT Training

🔧 Run DPO Training

🔧 Run Pre-DPO Preference Model Training

📁 Repo Structure

👩🏻‍💻 About

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
config		config
data		data
evaluation		evaluation
legacy		legacy
preprocessing		preprocessing
FINALREPORT.pdf		FINALREPORT.pdf
README.md		README.md
check_overlap.py		check_overlap.py
dataset_stats.py		dataset_stats.py
progress_report.pdf		progress_report.pdf
proposal.pdf		proposal.pdf
train_dpo.py		train_dpo.py
train_predpo.py		train_predpo.py
train_sft.py		train_sft.py

koreankiwi99/MNLP

Folders and files

Latest commit

History

Repository files navigation

DPO – CS552 Modern NLP (Spring 2025, EPFL)

🔍 Overview

🤗 Hugging Face Collections

🚀 Quick Start (Training Scripts)

🔧 Run SFT Training

🔧 Run DPO Training

🔧 Run Pre-DPO Preference Model Training

📁 Repo Structure

👩🏻‍💻 About

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages